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To my students and the Sage community; 
let’s keep exploring together. 
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To Everyone 


Welcome to Number Theory! This book is an introduction to the theory and 
practice of the integers, especially positive integers — the numbers. We focus on 
connecting it to many areas of mathematics and dynamic, computer-assisted 
interaction. Let’s explore! 

Carl Friedrich Gauss, a great mathematician of the nineteenth century, is 
said to have quipped® that if mathematics is the queen of the sciences, then 
number theory is the queen of mathematics (hence the title of [E.5.4]). If you 
don’t yet know why that might be the case, you are in for a treat. 

Number theory was (and is still occasionally) called ‘the higher arithmetic’, 
and that is truly where it starts. Even a small child understands that there is 
something interesting about adding numbers, and whether there is a biggest 
number, or how to put together fact families. Well before middle school many 
children will notice that some numbers don’t show up in their multiplication 
tables much, or learn about factors and divisors. One need look no further 
than the excellent picture book You Can Count on Monsters [E.6.1] by Richard 
Evans Schwartz to see how compelling this can be. 

Later on, perfect squares, basic geometric constructs, and even logarithms? 
all can be considered part of arithmetic. Modern number theory is, at its 
heart, just the process of asking these same questions in more and more general 
situations, and more and more interesting situations. 

They are situations with amazing depth. A sampling: 


e The question of what integers are possible areas of a right triangle seems 
very simple. Who could have guessed it would lead to fundamental ad- 
vances in computer representation of elliptic curves? 


e There seems to be no nice formula for prime numbers, else we would have 
learned it in middle school. Yet who would have foreseen they are so very 
regular on average? 


e Taking powers of whole numbers and remainders while dividing are el- 
ementary and tedious operations. So why should taking remainders of 
tons of powers of whole numbers make online purchases more secure? 


8In Wolfgang Sartorius von Waltershausen’s rather lengthy and nearly hagiographic (‘his 
undying name ... whom no contemporary nation can place as an equal beside’) biography 
Gauss zum Gedachtnis; see the bottom of page 79 at the Internet Archive (search archive.org, 
link kindly provided by Neil McKay at the University of New Brunswick). 

Swww. youtube. com/watch?v=N-7tcTIrers 
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This book is designed to explore that fascinating world of whole numbers. 
It covers all the ‘standard’ questions, and perhaps some not-quite-as-standard 
topics as well. Roughly, it covers the following broad categories of topics. 


Basic questions about integers 

Basic congruence arithmetic 

Units, primitive roots, and Euler’s function (via groups) 
Basics of cryptography, primality testing, and factorization 
Integer and rational points on conic sections 

The theory and practice of quadratic residues 

Basics of arithmetic functions 

The prime counting function and related matters 


Connecting calculus to arithmetic functions 


Finally, it won’t take long to notice that the way in which this book is 
constructed emphasizes connections to other areas of math and encourages 
dynamic interaction. (See the note To the Instructor.) It is my hope that all 
readers will find this ‘in context and interactive’ approach enjoyable. 


To the Student 


Hi! Not too many students read this bit in textbooks, but I hope you do, and 
I hope you circle stuff you think is important. In pen. 

Doing math without writing in the book (or on something, if you’re only 
using an electronic version) is sort of like reading much literature (like Shake- 
speare or Homer) or many religious texts (like the Psalms or Vedas) without 
paying attention to the spoken aspect. It’s possible, and we all may have done 
it (some successfully), but it’s sort of missing the point. 

So read this book and write in it. My students do. They even like it. 

Here are three things that will lead to success with this book. 


e You should like exploring numbers and playing with them. If you were 
the kind of kid who added 


1424+34+445+6+7+8+94+10+-:: 


on your calculator when you were bored to see if there would be an 
interesting pattern, and actually liked it, you will like number theory. If 
you then tried 

2-3-4-5-6-7-8-9-10---- 


you will really like it. 


e I also hope you are open to using computers to explore math and check 
conjectures. As Picasso said, “They can only give you answers” — but 
oh what answers! We use the SageMath!? system, one that will grow 
with you and that will always be free to use (for several meanings! of 
the word free!*). You don’t have to know how to program to use this, 
though it’s useful. Plus, you are using number theory under the hood 
anyway if you use the internet much, so why not? 


e Finally, you should want to know why things are true. I assume a stan- 
dard introduction to proof course as background, but different people 
are ready in different ways for this. If you are reasonably familiar with 
proofs by induction and contradiction, and have some basic experience 
with sets and relations, that is a good start. Some good free resources 
online include A Gentle Introduction to the Art of Mathematics [E.3.2| 
and Book of Proof |E.3.1]. 


10www. sagemath. org 
11 opensource. org/osd 
12www. gnu. org/philosophy/free-sw.en. html 
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Some of the proofs will be hairy, and some exercises challenging. (Not 
all!) Do not worry; by trying, you will get better at explaining why 
things are true that you are convinced of. And that is a very useful skill. 
(Provided you are convinced of them; if not, go back to the first bullet 
point and play with more examples!) 


Remark 0.0.1 A final note to the student. As a last note before you dig 
in, if you think that it is worth exploring the possible truth (see Section 25.3) 
of 


1 
LE 2p B44 SPC LT EST O+ Waa, 
or if as a kid you did 
r 
23" 


to see what would happen, then maybe you should become a mathematician. 
In that case, click on all links in the text and find a cool problem that interests 
you! 


To the Instructor 


Assuming that the reader of this preface is an instructor of an actual course, 
may I first say thank you for introducing your students to number theory! 
Secondly of course I’m grateful for your at least briefly considering this text. 

In that case, gentle reader, you may be asking yourself, “Why on earth 
yet another undergraduate number theory text?” Surely all of these topics 
have been covered in many excellent texts? (See the preface To Everyone for 
a brief topic list, and the Table of Contents for a more detailed one.) And 
surely there is online content, interactive content, and all the many topics here 
in other places? Why go to the trouble to write another book, and then to 
share it? These are excellent questions I have grappled with myself for the past 
decade. 

There are two big reasons for this project. The first is reminiscent of Ter- 
tullian’s old quote about Athens and Jerusalem; what has arithmetic to do 
with geometry? (Or calculus, or combinatorics, or anything?) At least in 
the United States, away from the most highly selective institutions (and in my 
own experience, there as well), undergraduate mathematics can come across as 
separate topics connected by some common logical threads, and being at least 
vaguely about ‘number’ or ‘magnitude’, but not necessarily part of a unified 
whole. 

When I first taught this course, I was dismayed at how few texts really fully 
tackled the geometry, algebra, and analysis inherent in number theory. Many 
do one or two (especially algebra, since number theory might often be a second 
course in abstract algebra), but few attacked all connections. Still, there are 
some which do, and I even found Elementary Number Theory by Jones and 
Jones [E.2.1] which does a very good job of this, though at a slightly higher 
level of sophistication than I found my students ready for. Those familiar with 
it will find that my presentation of certain topics (e.g. arithmetic functions, 
the zeta function) and some topic order is influenced by it; for certain proofs 
(especially in Dirichlet series) the proofs there and in [E.4.6] are the only ones I 
could find! I try to point out all such cases, and I have substantively modified 
even those in ways more appropriate for typical US undergraduates, as well as 
with somewhat different emphases. 

Given my first goal, I would have happily used that text with some extra 
details for my students, were it not for the magic and wonder of the internet. 
How could I not harness this to have my students do approximations to the size 
of computations that their browsers are constantly doing as they go shopping 
on the web? Having found Saget, I found it hard to avoid using it whenever I 
could, and encouraging students to do the same to explore things like Euler’s 
¢ function (as I encourage yours to do in Section 9.2 by hand). 


13 www. sagemath. org/ 
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Interactivity and visualization is becoming common currency in mathemat- 
ics education. In calculus and lower-level courses this has been true for some 
time, but even in abstract algebra there are books like Nathan Carter’s Vi- 
sual Group Theory |E.6.2|, specialized software projects like PascGalois!*, and 
many general applets (including ones from the Wolfram Demonstrations or 
Maple Mobius projects). This has been coming into number theory too, natu- 
rally, beyond the programming projects many books have included. An early 
number theory text involving explicit programs (and a CD-ROM!) written for 
extensive course work was [E.4.7], and the first book invoking extensive use of 
Sage commands was probably the founder’s own [E.2.3]. Very recently (in fact, 
after the unofficial release of this text) the book [E.2.10] (which has similar 
content and aims to the current work, though at a somewhat higher level) ap- 
peared in second edition with complete SageMath worksheets on its website, 
which can be used on CoCalc!® (or on a local Docker version of CoCalc!”). 
Hence the time is more than right for a fully online resource. 

So my second goal for this book is to bring online interactivity into a 
mainstream number theory text. It is wonderful to see students with an interest 
in the arts respond to the dynamic visualization in Sage interacts, while those 
with interests in computer science love to ask questions about how to view the 
source code or some of the details of representing large numbers. And all the 
students have access to computations from simple ones involving the aliquot 
parts function to the full Riemann formula for the prime number function. 

Why should you not use this book? First, I make few claims to topical or 
mathematical originality'®. The ordering is somewhat different than usual, I 
include a few topics I haven’t seen addressed adequately very often in truly 
introductory texts (notably a beginning of the geometry of numbers and long- 
term averages of arithmetic functions), and I have created many visualization 
and exploration oriented applets. 

At the low end of other reasons you might not use it, some topics of great 
importance which are perfect for beginners (especially partitions and contin- 
ued fractions) are absent. You can’t cover everything in a semester, after all, 
and I have shied away a bit from more purely combinatorial stuff, though I 
hope to steadily add slightly more in successive editions’. At the high end of 
preparation, I do not and cannot expect a course in abstract algebra or complex 
(or even real) analysis for my students, and so this book reflects that reality. 
Knowing about proofs by induction and contradiction, as well as basics of sets, 
integers, and relations, is what I can assume. In fact, I have great recommen- 
dations for you if you know all your students can do contour integration or are 
ready to define a number field — see References and Further Resources. Finally, 
I don’t have a corporation behind me. 

On the other hand, I think you should consider using it. This is class- 
tested material for standard topics (plenty for a semester-long course at most 
institutions), and not beholden to any interests beyond being a good resource 
for instructors in ‘mainstream’ undergraduate math programs in the United 
States. There are plenty of exercises (though not a surfeit, so feel free to 


14faculty.salisbury.edu/~despickler/pascgalois/ 

15tvazzana. sites. truman.edu/introduction-to-number-theory/ 

16cocalc.com 

17github. com/sagemathinc/cocalc-docker 

18] have tried hard to credit any non-standard proofs which are essentially in the form 
I found them, as well as many of those which I have modified for my students’ needs. I 
appreciate forbearance (and notification!) if I have missed any such citations so that I may 
correct them. 

19See {E.2.11] for a nice introduction in a more combinatorial vein, particularly to partition 
identities. 
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supplement), fun links, and hopefully a quirky and engaging sense of wonder 
and exploration. The price is also right. Finally, I don’t have a corporation 
behind me. 

Should you choose to use this text, I have only a few recommendations for 
how to use it (see also my notes To the Student). 


e Encourage in-class exploration. Put away books, turn off the computers, 
and just try stuff out. Create your own worksheet to explore (say) the 
Mobius function or solutions to linear Diophantine equations. In short, 
make sure your students see mathematics as a dynamic enterprise — par- 
ticularly because so many of the theorems involved are highly abstract. 


e Less is more. I will often pick one representative proof in a section, 
project it on the screen, and then really follow it through on an adjacent 
blackboard with specific numbers (such as p = 13, which is just big 
enough to be interesting but not so big as to be overwhelming). 


e Use computer examples judiciously. Sage (or any other system) can just 
as easily become a Delphic oracle (pun intended) spewing forth cryptic 
utterances as a useful tool to help create and solve conjectures. You’re 
possibly doing your students a disservice if you don’t use it at all, but 
despite having written this text with Sage in mind throughout, I don’t 
regard its use as completely essential. Number theory in this form has 
been around since Euclid, so the past thirty years of mass-market com- 
putation is a drop in the bucket of time. If you want a true inquiry-based 
approach, I like the text Number Theory through Inquiry [E.2.5] a lot. 


¢ Note the Sage notes (full list at the List of Sage notes). Especially if you 
have more than just a few students who have a little programming experi- 
ence, this is a perfect course to find projects to challenge them with, such 
as those in the venerable [E.2.4]. The Sage notes gently remind or give 
short introductions to some aspects of how to use Sage and Python?? 
(the language Sage is based on). They are not formally structured or 
arranged, or comprehensive; if you are looking for this, you should sup- 
plement your course with a real basic programming text in Python, such 
as [E.3.7] or [E.3.8]. (The already-initiated should note that as of January 
2020 this book has been updated to Sage 9 and Python 3, so some com- 
mands, especially those involving print(), may not work with certain 
earlier versions of Sage.) 


e Use the exercises, and ones outside the book if you want. There are exer- 
cises for each chapter, of varying difficulty levels (in the grand tradition 
of upper-level math texts, I do not provide solutions). In general, assign- 
ing daily, collecting weekly seems to be a decent model — though be sure 
to give students ample warning as to which ones will be collected! The 
last few chapters’ material is more advanced, and there are correspond- 
ingly fewer possible exercises. I find this to be a good time for a small 
project in the history of number theory; especially if you have students 
from several different cultural heritages, having them discover where it 
comes up in theirs (it nearly always does) has been a perennial favorite. 


There are no sections marked as optional, or table of dependencies, though 
these should be pretty similar to most elementary texts. (I do pretty much 
everything in my own course, picking results or sections to skip on the fly if 
time or the students seem to require this.) Here are some minor suggestions, 
though. 


20www. python. org 
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e If you are teaching a shorter course or wish to spend more time on some 
topic, the chapters on Beyond Sums of Squares and More on Prime Num- 
bers are certainly optional in this sense. 


e The chapters concerning Points on Curves and Long-Term Function Be- 
havior are not optional in my view of number theory, but may be viewed 
as ‘selected topics’. 


e The introductory (short) chapters 1 and 18 should not be considered 
optional, but may be emphasized or not to instructor taste. The point is 
just to motivate what we are doing before getting to formal definitions. 


¢ Ifyou don’t like cryptography or believe (like Hardy) that there are no ap- 
plications to number theory, you can certainly create a nearly application- 
free course by skipping the chapters on An Introduction to Cryptography 
and Some Theory Behind Cryptography. 


e I don’t consider the last several chapters on the prime counting function 
and other arithmetic functions connecting to calculus to be optional, but 
T have the luxury of having mostly juniors and seniors for a full semester. 
In a quarter course or one aimed more at sophomores (in the United 
States), one should still at the very least spend a couple days at the end 
of the course talking about these topics, perhaps discussing sections 21.2 
and 21.3, and smatterings of Chapter 25. 


As a final note, I hope you enjoy using the text as much as I’ve enjoyed 
teaching from it. Everyone should have that day where a student’s jaw drops 
from a cool theorem displayed visually, or when the students are working so 
intently on an in-class project that they don’t even notice the class period end. 
It’s been my privilege to have that happen, and my hope is this text can bring 
you closer to that goal. 
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Chapter 1 


Prologue 


What is number theory? Briefly, it is the study of the integers and questions 
arising from them. 


Definition 1.0.1 The Integers. The set of counting numbers is denoted 


N = {0,1,2,3,4,---}. 


1 


Note that in this text, this set begins at zero’. The integers is the set of 


positive and negative counting numbers: 
Z = {--- ,—-3, —2,-1,0,1,2,3,---}. 


% 
This is a fairly dry definition, though. The best way to find out what this 
definition means is to try to answer some questions about integers! 


1.1 A First Problem 


Let’s start! Suppose you have lots of left-over postage stamps” that are of just 
a few different denominations. It could be fun to see what amounts you could 
make from them. 

To be concrete, let’s assume first that all your stamps are numbered 2¢ and 
3c. Here are two questions we could ask. They are mathematically equivalent, 
but might take your exploration in two very different directions! 


Question 1.1.1 Suppose you only have stamps (or some other currency-like 
item) available in 2c and 3¢ amounts. 


e Which denominations of postage can you get by combining just these two 
kinds of stamps? 


e Which denominations can you not get with just these two kinds? 


lYou can search Mathematics Stack Exchange, Wikipedia, and many other interesting 
sites for discussions about this. Authors disagree, though number theory texts tend to go 
with the older tradition of only counting positive integers among the “natural numbers”, both 
because they count things and because they are a natural set to work with. With the advent 
of computers and (often) zero-based counting, as well as set theory, there is more variety, and 
it will be convenient to start at zero here since we integrate the use of a zero-based computer 
language so much. Apparently the ISO standard (www. iso.org/standard/64973.html) also 
begins counting at zero. 

2Perhaps because you only use email or texting now; too bad for you! 
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Once you’ve thought about that, try the same problem with 2¢ and 4¢ stamps. 
What is the same, what is different? 

Now let’s get to a nontrivial case; what about with 3c and 4¢ stamps? In 
this case, after some experimentation, it looks like only 1, 2, and 5 are not 
possible, so anything six or above is possible. We call this number (in this 
case, 6) the conductor of the set {3, 4}. 

What we are really asking, as might be clear by now, is which positive 
integers n are impossible (or possible) to write in the form n = 3a + 4y, for 
nonnegative integers x and y. This is also sometimes called the Frobenius? or 
coin problem. 

Continue trying this with different small pairs of positive integers (see also 
Exercise 1.4.5-Exercise 1.4.7). For each pair, pay attention to two things: 


e What is the conductor of the pair? (You might want to ask whether there 
is a conductor!) 


e How many numbers lower than the conductor cannot be written in this 
way as a sum with this pair? 


1.2 Review of Previous Ideas 


Before going further, we need a bit of review. The following three topics may 
be considered prerequisites for the course. 


1.2.1 Well-Ordering 


The first principle is both simple and deep. It is a deep property of the positive 
integers, but we give it its usual name. 


Axiom 1.2.1 Well-Ordering Principle. Any nonempty set of positive 
integers has a least/smallest element. 

This principle actually holds with any subset of Z which is bounded below, 
such as N (recall Definition 1.0.1). 

Let’s use it as an example to prove the following fact which you probably 
didn’t know required proof. 


Fact 1.2.2 Consecutive Integers. There are no integers between 0 and 1. 
Proof. This proof proceeds by contradiction. Assume there are some such 
integers, and let 

S={tEZ|0<a< 1}. 


This set must then have a least element a, and 0 < a < 1. If we multiply 
through by a (which is positive) then we obtain 0 < a? < a. 

Thus a? is another integer such that 0 < a? < 1, so a € S, but we also know 
that a? <a. So a? is an element of S which is less than the least element of S. 
That is a contradiction, so our original assumption was wrong and there are 
no such integers (i.e. S is empty). a 


Remark 1.2.3 To review, proofs by contradiction and contrapositive both 
start by assuming the negation of the conclusion. A proof by contrapositive 
uses that assumption to prove the negation of the original assumption. A proof 
by contradiction, on the other hand, leads to some absurdity, but not neces- 
sarily just negating the original assumptions. In the proof above of Fact 1.2.2, 


3For a very full discussion, see [E.7.20], but not until after you have started the next 
chapter of this book! 
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the contradiction is that you can’t have two different smallest elements of a 
set. 
1.2.2 Induction 


Sometimes we need a way to prove a statement for all integers n after a certain 
point, for instance integers greater than or equal to n = 1. This is usually 
called proof by induction. Usually there are two steps in a typical ‘simple’ 
induction. 


1. First we prove the “base case” (often n = 1 or n = 0). 


2. Then we prove the “induction step”, that the case n = k implies case 
n=k+1. 


These combine to prove a fact for all cases n > 1. 


Example 1.2.4 Archetype for Induction. We shall show that 
(n 7 1) 
yon nee 


Solution. The base case is to check that 1 = Hart) which is easy. 
The induction step begins with the assumption that 


kk +1 
yi = MED) 


i=1 


and then proceeds by showing that the formula is still true when k is replaced 
with k+1. For this proof, to add just one more integer k+1 to the sum means 


k+1 k 


Si=Soit( k+1) 
i=l i=l 


(which we can see by rewriting the sum). Then we can just plug in the induction 
assumption to obtain 


k 
i+ 1) = etry = +n (G41) =e 


which is exactly what is required to finish the induction step, namely 


ee 


Relative to some other basic axioms, one can actually take the legitimacy 
of induction as a final axiom and use that to prove well-ordering (Axiom 1.2.1) 
is true. Instructors will wish to note that the converse is false’ in general, 
though both are true for the counting numbers or positive integers. We will 
not include any such proofs (or a collection of relevant axioms, such as Peano’s) 
here, but note the helpful exposition in [E.7.33]. 


4A counterexample is given by the set of ordinals less than w + w, which is well-ordered 
but for which induction does not hold. 


CHAPTER 1. PROLOGUE 4 


1.2.3 Divisibility 


Definition 1.2.5 If an integer n can be written as a product kd = n of two 
integers k and d, then we say that d divides n, or that n is divisible by d, or 
that dis a divisor of n. We write d|n to denote that d divides n. >) 


Example 1.2.6 Divisibility is familiar. 


¢ For instance, the concept that n is even is just the same thing as 2 | n. 


¢ The divisors of n = 8 are .. +1,+2,+4,+8. (Don’t forget negative divi- 
sors. ) 


e Very often we can write this generically, so for example n | x + 1 means 
that «+ 1 can be written as the product of n and some other integer m. 


We occasionally use the term proper divisor to denote a positive divisor of 
n which is not n. When n = 8, we see that 1, 2, and 4 are all proper divisors. 


There are lots of interesting things to say about divisibility. Let’s prove 
a somewhat unexpected statement using induction and just what we already 
know. 


Example 1.2.7 Show that 4| 5" —1 for n> 0. 


Solution. This proof will proceed by induction. This time the base case will 
be n = 0. We'll try to make the steps clear with separate bullets. 


¢ Base step: If n = 0 the formula says that 4 divides 5° -1=1-—1=0, 
which is definitely true. 


e Induction step: 
o Suppose 4|5* —1. Then, by Definition 1.2.5, 5* — 1 = 4x for some 
integer x. 
o Hence 5* = 1 + 42 is a fact we could use later. 
o Our goal in this step is to show 4 | 5*+1 — 1. 


o Since we need something true about 5*+! — 1, let’s consider 5*+! 
first. The key observation will be that 5*+1 = 5* . 5. 


o Using the fact we obtained from the induction assumption we can 
write this as 5* -5 = (1 +4) - 5; this means that 
pet) _ 1] = 5(1 +42) —1= 20244. 


o Certainly 20x + 4 is divisible by 4. 


o Thus we have shown that 4 | 5*+! — 1, so we have finished the 
induction step, and our proof by induction is complete. 


There are lots of other propositions about divisibility you are probably 
familiar with from previous courses. Proposition 1.2.8 has a sampler. 
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Proposition 1.2.8 Divisibility Facts. 
1. Ifa|bandb|c thena|c. 
2. If a| 6 then ca | cb. 
3. Ifc|aandc|b then c| au + bu for any integers u,v. 


4. Ifn>0 then all divisors of n are less than or equal to n. 

These are not hard to prove (see Exercise 1.4.1) using direct proof, where 
no indirect or inductive steps are needed. For instance, the second one can be 
proved by simply noting b = ka for some k € Z, so that cb = c(ka) = c(ak) 
(ca)k. The others are similar, and are good practice with using basic algebraic 
manipulation in proof. 


1.3 Where are we going? 


Before moving on from these preliminaries and our introductory Prologue, let’s 
step back. What will we cover in this text? 


e We have started by exploring basic integer questions, and will continue 
looking at basic integers at first (Chapter 1-Chapter 3). 


e We'll be essentially forced to move to the concepts of congruences and 
primes by the material (Chapter 4—Chapter 7). 


e Next, we’ll explore a more advanced point of view of the concepts of 
integers and congruences, including groups, to attack cryptography effi- 
ciently (Chapter 8—Chapter 12). 


e About halfway through, we’ll introduce the ways in which geometry in- 
filtrates number theory (Chapter 13-Chapter 17). 


e Finally, functions and limits will help us illuminate primes in depth, as 
well as show us how the ideas of calculus really do show up in num- 
ber theory quite naturally (Chapter 18-Chapter 24), concluding with an 
introduction to the legendary Riemann Hypothesis in Chapter 25. 


Let’s get ready for an exciting exploration of number theory! 


1.4 Exercises 


Prove some or all of the facts in Proposition 1.2.8. 


Find a counterexample to show that when a | b and c | d, it is not 
necessarily true that a+c|b+d. 


3. Prove using induction that 2” > n for all integers n > 0. 
4. Prove, by induction, that if c divides integers a; and we have other integers 
uy, then c| > \y_, iti. 
Exercise Group. Exploring the conductor question is a fun way to do new 
math where you don’t already know the answer! 
5. Write up a proof of the facts from the first discussion about the con- 
ductor idea (in Section 1.1) with the pairs {2,3}, {2,4}, and {3, 4}. 


6. What is the conductor for {3,5} or {4,5}? Prove these in the same 
manner as in the previous problem. 
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7. ‘Try finding a pattern in the conductors. Can you prove something 
about it for at least certain pairs of numbers, even if not all pairs? 
8. What is the largest number d which is a divisor of both 60 and 42? 
9. Try to write the answer to the previous problem as d = 60x + 42y for 
some integers x and y. 
10. Get a Sage worksheet account somewhere, such as at https://cocalc.com 
(CoCalc) or at a Sage notebook or Jupyterlab server on your campus, if 
you don’t already have one. 


11. Color the ‘Paint by Numbers’ FoxTrot Sunday comic? from 2006. 


Paint bby Numbers + Quem fox” 
wn 4 f SS eal ion ea ee 


Divisible by 13 
= Green 


Divisible by 17 
= Orange 

Divisible by 19 
=Red 

Prime numbers 
= Yellow 


Figure 1.4.1 FOXTROT © Bill Amend. Reprinted with permission of 
ANDREWS MCMEEL SYNDICATION. All rights reserved. 


1.5 Using Sage for Interactive Computation 


This text is advertised as having interactive computation, but so far any com- 
putation has been your own. How does digital computation (interactive or not) 
fit in? We’ll skip ahead slightly here to see how this will work. 

In the interactive version of this text, the areas below are called Sage cells, 
or cells for short. Assuming you’re connected to the internet, this very first cell 
will use SageMath® (usually just called Sage) to check whether a given fraction 
remains a fraction when reduced, or whether it reduces to an integer. Click 
“Evaluate” to try it out. 


38/19 


2 


Again, if you’re viewing this online, go ahead and try changing the numbers, 
clicking the evaluate button again. 


5Shareable version available online, in lower resolution at 
Licensing. andrewsmcmeel.com/features/ft?date=2006-04-09, or see the online version 
of the text for a non-shareable version in higher resolution. 

Swww. sagemath.org 
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As we go through the text, you’ll see lots of opportunities to use Sage. 
Sometimes I’ll give you the opportunity to learn a little bit about how to use 
it in Sage notes, such as the following one. 


Sage note 1.5.1 About Sage notes. Sage notes will teach you useful things 
about basic programming, or more general facts about Sage and Python’, the 
computer language Sage is based on. 

Let’s try another computational cell. We haven’t defined prime numbers 
yet (see Chapter 6), but I figure you know what they are. Here you can check 
whether an integer is prime. 


is_prime (3169) 


True 


Sage note 1.5.2 Using commands in Sage cells. Assuming you are 
using this book online, you can put any legitimate Sage command in the cells 
above. (Try integrate(x*3,x) if you know some calculus.) Or you can use 
these commands in your own Sage worksheet at your local Sage server or with 
CoCalc, so that you can save your work! 

If you are using an offline or hard copy version, I still highly recommend 
sifting through some of the code and commands; much of it will enlighten the 
reader. (Then try it out online or on your local computer!) 

Finally, let’s test some conductor ideas using Sage. In the cell below, Sage 
will automatically list all the nonnegative numbers up to n that can be written 
as n = ax + by for nonnegative integers « and y. The default values are 
a =3,b= 4; you can experiment by changing one or both of these values. 


@interact 
def _(a=(3,[2..10]) ,b=(4,[2..10]) ,n=(20,[10..50])): 
List_of_them=List(set(Laxx+b*y for x in srange(n/a+1) for 
y in srange(n/b+1)])) 
List_of_them=[Litem for item in lList_of_them if item <= n 
J; List_of_them.sort() 
pretty_print (html ("The_nonnegative_integers._upito_$n=%s$_ 
which. can_be"%(str(n)))) 
pretty_print(html("written_as.positive_combinations. of. 
$a=%s$_and_$b=%s$_are:"%(str(a),str(b)))) 
print(list_of_them) 


Notice that with the default values above we are definitely getting the same 
answers as expected from our ‘pencil and paper’ experiments. 

Finally, notice that the algorithm I used in the code is very naive — I just 
listed all possible combinations under a certain size. It would be interesting to 
use this to try to verify patterns you may have noticed about the precise size 
of the conductor, and when it exists. 


7www.python. org 
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Summary: Prologue 


After reminding ourselves of The Integers, this introductory chapter covers 
the following main topics. 


1. In Question 1.1.1 we introduce the notion of the conductor to get thinking 
about nontrivial integer questions. 


2. We review basic uses of the following principles: 


e The Well-Ordering Principle 
e Proofs by Induction 


e Basic facts about Divisibility, of which we will especially use Propo- 
sition 1.2.8 


3. We get a brief look at where we are going in this text. 


Finally, after the usual Exercises, there are few notes on Using Sage for Inter- 
active Computation. 


Chapter 2 


Basic Integer Division 


In this chapter, we introduce some concepts of numbers which are familiar, but 
key for our further study. In particular, we try to understand why they work. 


¢ The division algorithm (Section 2.1), 
e The greatest common divisor (Section 2.2), and 
e The Euclidean algorithm (Section 2.3). 


Then we’ll put them together with the Bezout identity (Section 2.4). 


2.1 The Division Algorithm 


2.1.1 Statement and examples 


Let’s start off with the division algorithm. This is the familiar elementary 

school fact that if you divide an integer a by a positive integer b, you will 

always get an integer remainder r that is nonnegative, but less than b. 
Equally important, there is only one possible remainder under these cir- 

cumstances. 

Theorem 2.1.1 Division Algorithm. For a,b € Z and b > 0, we can 

always write a = qb+r withO <r < b and q an integer. Moreover, given 

a,b there is only one pair q,r which satisfy these constraints. We call the first 

element q the quotient, and the second one r the remainder. 

Proof. The proof appears below in Subsection 2.1.2. | 
Finding q and r is easy in small examples like a = 13,b = 3. 


We have 13 =4-3+1soq=4andr=1. 


For bigger values it’s nice to have the result implemented in Sage. 


divmod (281376, 29) 


(9702, 18) 


We can check the correctness of the Sage output by multiplying and adding 
back together. 
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9702*29+18 


281376 


Sage note 2.1.2 Counting begins at zero. There are several things to 
note about this early computation. First, note that the answer to divmod came 
in parentheses, a so-called tuple data type. 

Second, there is another way to approach this computation, more program- 
matically so that it’s easier to reuse. What do you think the [@] and [1] 
mean? 


divmod (281376 ,29)[@] * 29 + divmod (281376 ,29)[1] 


281376 


To access the first and second parts of the answer (the quotient and re- 
mainder), we use square brackets, asking for the 0th and 1st parts of the tuple 
(9702,18)! (This operation is called indexing.) In Python, the programming 
language behind Sage (as in many other languages), counting begins at zero. 

The discussion in the previous note actually turns out to be an enduring 
argument in number theory, too. Do we only care about positive numbers, or 
nonnegative ones as well? We saw this in the stamps example, since one could 
send a package for free under certain circumstances (campus mail), but might 
not care about that case. Similarly, are we required to use at least one of each 
type of stamp, or is it okay (as in our problem) to not use one type? 


2.1.2 Proof of the Division Algorithm 


One neat thing about the division algorithm is that it is not hard to prove but 
still uses the Well-Ordering Principle; indeed, it depends on it. The key set is 
the set of all possible remainders of a when subtracting multiples of b, which 
we call 

S={a—kb|keZ}. 


(Note that the set looks the same if we add multiples of b, since k € Z, but for 
the purposes of exposition it is easier to think of it as subtraction.) 

The object of main interest in the proof will be the nonnegative piece of 
S which we will call 5’ = SAN. For example, if a = 13,b = 3, then S = 
{...19,16, 13, 10,7,4,1,-—2,—5,...} while S’ = {...19, 16,13, 10,7,4, 1}. 

Our strategy will be to apply the well-ordering principle to S’. (It is worth 
thinking briefly about why both S and S’ are nonempty.) Give the name r to 
the smallest element of S$’, which must be writeable as r = a — bq (that’s the 
definition of being an element of S’ C S, after all). 

Now let’s briefly suppose by way of contradiction that r > b. In that case 
we could subtract b from r, and then r — b € 5” as well. So r would not be the 
least element of S’, which is a contradiction. Hence we know that r < b. (Note 
that r is the smallest nonnegative number in S’, just as with our intuition 
regarding remainders from school.) 

We still have to show that r and q are the only numbers fulfilling this 
statement. Suppose a = bq’ +r’ for some integers q’,r’ where 0 < r’ < DB; 
clearly if r =r’ then we can solve a— bq =r =r’ = a—bd' to get gq = q/ (since 
b > 0), so the only interesting case is if r 4 r’. Without loss of generality, we 
can assume r <r’. 

In that case, a — bg = r < r’ = a— bq’, which can be rewritten as 0 < 
r’—r=b(q—q’). Since g,q' € Z, by Fact 1.2.2 q—q' must be at least one if 
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it isn’t zero. But then b= b-1 <r’ -r=0b(q— qq) orb<r+b <7", which 
contradicts 0 <r’ < b. Thus q—q’ =0 and hence q = q' and r =r’. 

It’s worth actually trying out the details of this proof with some a and 8, 
say with a = 26 and b= 3. 

As a scholium (see Exercise 2.5.1) note that if b < 0 there can still be a 
positive remainder, but here we would need 0 < r < |b] in the theorem. 


2.1.3 Uses of the division algorithm 


It’s kind of fun to prove interesting things about powers using the division 
algorithm, and likely you did in a previous course. For instance, there is an 
interesting pattern in the remainders of integers when dividing by 4. If you 
are online, evaluate the following Sage cell to see the pattern. (It’s also easy 
to just get the remainders of the first ten or so perfect squares by hand.) 


for i in [0..10]: 
pretty_print (html ("The remainder _of_i{} squared with. 
respect._to_4.is_{}".format(i,divmod(i*2,4)[£1]))) 


Sage note 2.1.3 Repeating commands for different input. The syntax 
for i in [@..10]: just means we want to do the next command for integers 
from 0 to 10. Such a repetition is called a loop. 

Another way Python uses to generate the list of different input is the range 
command; try substituting range(11) for [@..10] in the Sage cell above. Can 
you discover what the difference is between these? 

The rest of the command (all the percent symbols and so forth) is mostly 
for correct formatting. That includes the indentation in the second line — an 
essential part of Python and Sage. 

This certainly provides strong numerical evidence for the following propo- 
sition. But better than that will be the proof! 


Proposition 2.1.4 A perfect square always leaves remainder r = 0 or r =1 

when divided by 4. 

Proof. Using the division algorithm, we can write n = 4q+r. What happens 

if we square it, (4q + 1r)?? 

Algebraically this yields 16g? + 8gr + r?. Clearly this is a multiple of 4 plus 

r?. So the only possible remainders of n are the remainders of r?, where r is 

already known to be less than 4! 

Now check these yourself to see that the only possibilities are the ones in the 

statement of the proposition. a 
One cool thing about this proof is that if we just change the proof from 

using n = (4q+71)? to one using n = (mq+r)?, we can essentially do the same 

thing for several divisions at once. If the number we divide by is m, then 


(mq + r)? = mq? +2mgqr +r? = m(m¢q? + 2qr) + rr, 


2 since the rest is already 


hence all that matters for the final remainder is r 
divisible by m. 

But we know that there are only m possibilities for r, so it’s easy to check 
all their squares. For m = 6, the following cell checks for you if you don’t want 


to check them by hand. 


for i in [Q..5]: 
pretty_print (html ("The_remainder_ofi%s_squared with, 
respect _to_6_is.%s"%(i,divmod(i*2,6)[1]))) 
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This verifies that r = 0,1,3,4 are the only possible remainders of perfect 
squares when you divide by six. 


2.2 The Greatest Common Divisor 


It seems intuitive that of all the numbers dividing a number (the divisors of 
the number), one is biggest. We can carry that idea to two numbers. 


Definition 2.2.1 Common Divisors. If we consider the various divisors of 
two numbers a and b, we say that d is a common divisor of a and b if d|a 
and d | b. If d is the biggest such common divisor, it is called the greatest 
common divisor, or gcd, of a and b, written d = gcd(a, b). © 


Example 2.2.2 What are all the common divisors of 6 and 10? What is their 
gcd? 


Remark 2.2.3 What is the greatest common divisor of zero and zero? By de- 
finition, there is none (or it is infinity?). Some authors (such as [E.2.1]) simply 
don’t allow this case at all; others (like [E.2.4]) define it to be zero without 
further comment. As for computation, both SageMath! and Wolfram Alpha? 
apparently compute it to be zero (perhaps by The Euclidean Algorithm), while 
one online calculator? throws an error. 

This text chooses to remain agnostic on this point. However, ring theory 
and lattice theory both allow for an alternate definition which naturally yields 
zero as the answer; either consult an abstract algebra text, or see all the answers 
to this question at Mathematics StackExchange?* for some good fireside reading 
after you do your homework for this section. 


We now come to a great definition-theorem. 


Theorem 2.2.4 Characterizing the greatest common divisor. Let a 
and b be integers, not both zero. Then the greatest common divisor of a and b 
is all of the following: 


e The largest integer d such that d| a andd|b. (This is Definition 2.2.1.) 


e The number achieved by applying the Euclidean algorithm (a repeated 
division algorithm) to a and b. (See Section 2.3.) 


e The smallest positive number which can be written as ax + by for some 
integers x and y. (See Section 2.4 and Subsection 2.4.2.) 

This is amazing, and the first real indication of the power of having multiple 
perspectives on a problem. It means that the very theoretical issue of when a 
gcd exists (and finding it) can be treated as a purely computational problem, 
completely independent of finding divisors in the usual sense. And further, 
there is a definition purely in terms of addition and multiplication, nothing 
more complex. 

If you need to actually calculate a gcd, you use the algorithm. If you want 
to prove something about it that has to do with dividing, you use the original 
definition. And if you need to prove something about it where division is hard 
to use, you use the third characterization. This sort of idea will come up again 
and again in this book — that having multiple ways to define something really 
helps. 


lsageceLL. sagemath. org/?z=eJxLTO7RMNAx@AQACUICDA== 
2www.wolframalpha.com/input/?i=gcd(@, @) 

3www.dcode. fr/gced 

4math. stackexchange. com/questions/495119/what-is-gcd0-@ 
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2.3 The Euclidean Algorithm 


The Euclidean algorithm says that to find the gcd of a and b, one performs 
the division algorithm until zero is the remainder, each time replacing the 
previous divisor by the previous remainder, and the previous number to be 
divided (sometimes called dividend) by the previous divisor. The last non-zero 
remainder is the gcd. 

We’ll state and prove this momentarily (Algorithm 2.3.3). Let’s try it with 
a reasonably sized problem. 


Example 2.3.1 Let a = 60 and b = 42. 


60 = 42-14 18 
42=18-2+4+6 
18 =6-3+0 


So gcd(60, 42) = 6. 

This procedure is named after Euclid because of Proposition VII.2° in Eu- 
clid’s Elements. There is an amazing complete Java interactive implementation 
of all the propositions, by David Joyce®, whose version of this proposition in- 
cludes some explanation of Euclid’s background assumptions. In particular, 
Euclid basically assumes the Well-Ordering Principle, although of course he 
didn’t think of it in such anachronistic terms. 


Historical remark 2.3.2 Euclid’s Elements. Euclid, a mathematician in 
Alexandria during the Hellenistic era, appears to have written the Elements as 
a compendium of rigorous mathematical knowledge. In addition to being the 
main geometry textbook in the Western and Islamic worlds for two millennia 
(as late a teacher as Charles Dodgson a.k.a. Lewis Carroll extolled its virtues in 
print in Euclid and His Modern Rivals’), there are substantial number-theoretic 
portions as well. No one really knows how much of the Elements is original 
to Euclid, but the work as a whole is monumental and well-organized, despite 
some well-known criticisms (see e.g. the discussion in [E.5.5]). 

Try the algorithm on your own by hand for the gcd of 280 and 126. Or, for 
even more practice, try it with gcd(2013, 1066) and then check your work with 
Sage. 


gcd(2013,1066) 


Algorithm 2.3.3 Euclidean algorithm. To get the greatest common divisor 
of a and b, perform the division algorithm until you hit a remainder of zero, as 
below. 


a=ba +r 
b=1T14G2 +12 


T1 = 7293 + 73 


Tn—3 = Tn—-29n-1 + Tn-1 


Tn—2 = Tn-19n + 0 


5alephO.clarku.edu/~djoyce/java/elements/bookVII/propVII2. html 
6aleph®.clarku.edu/~djoyce/java/elements 
“books. google. com/books? id=rEUMAAAAYAAJ 
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Then the previous remainder, rn_1, is the greatest common divisor. 
Proof. First let’s see why this algorithm even terminates. The division algo- 
rithm says each r; is less than the previous one, yet they may not be less than 
zero. So let’s apply the Well-Ordering Principle to the set of remainders. This 
set must have a least positive element, and will be the answer. Another way 
to think about it is that since b is finite, there won’t be an infinite number of 
steps. 
Of course, that just gives a number, with no guarantee it has any connection 
to the gcd. So consider the set of common divisors d| a and d|b. All such d 
also divide 

a—qb=1-a+(-qm):b=" 


So these d also divide rg = b— qg2r1, and indeed divide all the remainders, even 
Tn-1 = Tn-3 — In—1Tn—2- So all common divisors of a and 6 are divisors of 


Tn—-1- 

On the other hand, if d divides r,,_1, it divides rn_2 = Tn—1Gn, and thus divides 
Tn—3 =Tn—29n—1 + Tn—1, and so forth. Hence d divides a and b. 

So the set of common divisors of a and b are equal to the set of divisors of ry_1, 
so this algorithm really does give the gcd. | 


As you might expect, the proof makes more sense if you try it out with 
actual numbers; for the theoretical view, see Exercise 2.5.14. Especially if you 
can find a and b for which the algorithm takes four or five steps, you will gain 
some insight. 


2.4 The Bezout Identity 


2.4.1 Backwards with Euclid 


Now, before we get to the third characterization of the gcd, we need to be 
able to do the Euclidean algorithm backwards. This is sometimes known as the 
Bezout identity. 


Definition 2.4.1 Bezout identity. A representation of the gcd d of a and 
b as a linear combination ax + by = d of the original numbers is called an 
instance of the Bezout identity. (This representation is not unique.) © 

It is worth doing some examples*®. Perhaps you already have gotten one, 
probably by trial and error. For instance, 


6 = —2-60+3- 42. 


The third characterization in Theorem 2.2.4 implies that doing this is al- 
ways possible; gcd(a,b) = ax + by for some integers x and y. Doing the 
Euclidean algorithm backwards is one way to obtain this. 


Example 2.4.2 Sometimes it helps visually when starting to write the Euclid- 
ean algorithm down one side of a table, and then go up the other side of the 
table to obtain an instance of the Bezout identity. 

Here’s an example with the gcd of 8 and 5; follow it from top left to the 
bottom and then back up the right side. The middle column provides the 
necessary rewriting. 


8For convenience, all examples will be in the form d = ra + yb, putting the coefficients 
first, even though we state this in the other order. The habit of using the letters a, b,d and 
alphabetical order is too hard to break. 
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8=1-54+3]1-8-1-5=3 ) 1=2-3-1-5=2-(8-1-5)-1-5=2-8 


5=1-342]1-5-1-3=2 | 1=1-3-1-2=1-3-1-(5-1-3)=2-3 
3=1-24+1]1-3-1-2=1]1=1-3-1.-2 
2=2-1+0 Go up this column... 


So 1 =2-8—3-5, or 2-8+(-3)-5. 


Example 2.4.3 Usually students need a couple of examples of this to get the 
way this works, so here is another one. Let’s do it with the gcd of 60 and 42. 


60 =1-42+18 | 1-60—1-42=18 | 6=1-42—2-18=1-42—2-(60—1- 42) 


42=2-18+6 1-42—2-18=6 6=1-42-—2-18 

18=3-6+0 Go up this column... 
Simplifying 1 - 42 — 2- (60 — 1- 42) (the top line on the right), we get 6 = 
3-42 + (—2) - 60 again. 


This question of the Bezout identity is implemented in Sage as xgcd(a,b), 
because this is also known as the eXtended Euclidean algorithm. 


xgcd (60, 42) 


(6 , =2 ? 3) 
Or, 6 = —2-60+4+3.- 42, once again. 
Example 2.4.4 Try to get the xgcd/Bezout identity for gcd(135,50) using 
this algorithm. You should get 5 = 3-135 + (—8)-50. Can you get another 
one a different way? 
Try the following Sage cell to check that it works. 


xgcd(135,50)[1]*135 + xgcd(135,50)[2]*50 


5 


Sage note 2.4.5 Remind how to get list elements. Do you remember 
what the [1] means? What do you think the [2] means in this context? 


Example 2.4.6 Try to get the xgcd/Bezout identity for gcd(1415, 1735) using 
this algorithm. Hopefully you get 5 = 103-1415 + (—84) - 1735, though it may 
take a while! The previous example might help you on your way. 


Historical remark 2.4.7 Bezout and friends. While Etienne Bézout? did 
indeed prove a version of the Bezout identity for polynomials, the basics of 
using the extended Euclidean algorithm to solve such equations was known in 
Europe to Bachet de Méziriac (see Historical remark 3.5.2) about four hundred 
years ago. However, the Indian mathematician Aryabhata about 1500 years 
ago in his method later called the Kuttaka!® used essentially the same algo- 
rithm, in fact in a manner more amenable to swift and accurate usage than 
the one we (and most Western texts) use, with a view toward questions such 
as Theorem 3.1.2. 


2.4.2 Proving the final characterization 


The final characterization of the greatest common divisor (Theorem 2.2.4) is 
that it is the least positive integer which can be written ax + by for integers 


9mathshistory.st-andrews.ac.uk/Biographies/Bezout 
10en. wikipedia. org/wiki/Kuttaka 
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x,y. Let’s prove that now. 

First, we know there are some positive integers which can be written ax+by 
(just use positive x, y, or negative ones if a or b are negative). So, by the Well- 
Ordering Principle, we know there is a smallest such positive integer, which 
we will call c= au+ bv. Let’s also designate the gcd of a and b to be d. 

By Proposition 1.2.8, any integer which divides a and b divides any ax + by, 
so it divides au + bv = c. In particular, since d is a divisor of both a and 8, it 
must also divide c. Sod <e. 

On the other hand, we know from the backward/extended Euclidean algo- 
rithm/Bezout identity that d can be written d = az’ + by’ for some integers x’ 
and y’. Since c is the smallest such (positive) integer, c < d. Thus we conclude 
that d= c. 


2.4.3 Other gcd questions 


We mentioned earlier there are many such linear combinations for any given 
pair a,b. How might we find more than one such representation? 


Example 2.4.8 Using Bezout to get another Bezout. We used the 
backwards Euclidean algorithm to see that 6 = —2-60+3.- 42. Let’s use that 
to get another. 


e Since 6 is itself a divisor of both 60 and 42, let’s pick one (the smaller 
one!), 42, and write it as 42 = 7-6. 


e Then we can really write 
42 =7-6=7- (—2-60+3- 42), 
since after all we just saw that was a way to represent 6! 
e Now we plug this back into the original equation: 
6 = —2-60+3-42 = —2-604+3- (7-6) 
= -—2-604+3-(7-(—2-60+3.- 42)) 


If we simplify it out, that means 6 = —44-60+ 63-42, which is indeed correct! 


So, substituting a Bezout identity into itself yields more and more such 
identities. How many such identities are there? Is there a general form? 

Another interesting question is that some gcds of large numbers are very 
easy to compute. What makes finding gcd(42000, 60000) so easy? If you’re in 
a classroom, this is a perfect time to discuss. 

On a related note, if gcd(a, b) = d, could you make a guess as to a formula 
for gcd(ka, kb) (for k > 0)? Can you prove it in Exercise 2.5.16? (Hint: here 
is where our original definition or the Bezout version could be useful.) 


2.4.4 Relatively prime 


There is one final thing that the linear combination version of the gcd can 
give us. It is something you may think is familiar, but which can arise very 
naturally from the Bezout identity. 

Consider the smallest possible greatest common divisor, which is one. Un- 
der what circumstances would a and b have gcd(a,b) = 1? By our characteri- 
zation, it is precisely when you can write ax + by = 1 for some integers x and 


y- 
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Think about this, though; if the ged of a and b is 1, then we could write any 
integer as a (linear) combination of a and b! This is a property I think people 
would have come up with no matter how the development of mathematics had 
gone; namely, identifying pairs of integers such that you can write any number 
as a (linear) combination of them. 


Definition 2.4.9 Relatively Prime. If the greatest common divisor of two 
numbers is one, we call them relatively prime numbers or coprime numbers. 

Later, we will need to have a term for the situation where, in a collection 
of several integers, all possible pairs are relatively prime. We will call this 
mutually coprime, coprime in pairs, or an analogous term. © 


Proposition 2.4.10 Here are two interesting facts about coprime integers a 
and b: 


e Ifa| cand b|c, then ab|c. 


e Ifa| be, thena|c. 
Proof. The first is not too hard to prove, if you think in terms of Bezout. It 
does need a little cleverness. 


e Remember that 1 = ax + by for some x, y, by definition of being coprime. 
e So c=cax + cby. 


e Now write c = kb and c = fa, and substitute them in the opposite parts 
of the previous line. 


e This gives c = (kb)ax+(la)by, and ab definitely divides both parts of this, 
so it divides the whole thing by our earlier proposition about divisibility. 


We leave the second as an exercise (Exercise 2.5.19). a 

It’s also useful to try to find counterexamples! Can you find an example 
where gcd(a,b) #4 1, a | c and b | c, but ab does not divide c? (See Exer- 
cise 2.5.20.) 


2.5 Exercises 


1. Try stating and proving the division algorithm (Theorem 2.1.1) but for 
b<0. 

2. Can you find an n such that the possible remainders of a perfect square 
when divided by n are all numbers between zero and n — 1? If you can, 
how many different such n can you find? If not, can you prove there are 
none? 

3. Write the gcd of 3 and 4 as a linear combination of 3 and 4 in three 
different ways. (Hint: trial and error.) 


4. You can define the gcd of more than two numbers as the greatest integer 
dividing all of the numbers in your set. So, for instance, gcd(20, 30, 70) = 
10. Calculate the gcd of some hard-looking sets of three numbers by listing 
divisors. 
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11. 


12. 


13. 
14. 


15. 


16. 
17. 


18. 


19. 


With Sage you can calculate arbitrary gcds like this, so you can check 
your work in this problem using the same command as before, but with 
slightly different syntax. 


gcod([3800,7600,1900]) 


1900 


Find the gcd of the four numbers 1240, 6660, 15540, and 19980 without 
Sage. 

Prove that gcd(a,a + 2) = 1 if a is odd and gcd(a,a+ 2) = 2 if a is even. 
Let a be a positive integer. What is the greatest common divisor of a and 
a+1? Prove it. 

Use the Euclidean algorithm to find the ged of 51 and 87, and then to 
write that gcd as a linear combination of 51 and 87. 


Define the least common multiple of a and b to be the smallest positive 
number which is divisible by both a and b. Prove that the least common 
multiple of a and 6 is ab precisely when a and 6 are coprime. 


. Find the gcd of 151 and 187 using the Euclidean algorithm, then write the 


gcd as a linear combination of these two numbers in two different ways. 
Find the gcd of 500000001 and 5000001 in any way you see fit other than 
asking someone else. 

In the following interact you can explore the gcd of numbers of the form 
5-10" +1 for various n. Does the pattern you see continue? How would 
you find a counterexample, how might you prove it? 


@interact 
def _(m=(3,[1..20]) ,n=(2,[1..20])): 
pretty_print (html ("The _gcd_of_${}$ and_${}$ ise 
${}$". format (5*10%m+1, 5*10*n+1, 
gcd (5*10*m+1 ,5*10*n+1)))) 


Find the gcd of three four digit numbers, none of which is divisible by ten. 


To make the proof of the Euclidean algorithm, Algorithm 2.3.3, very com- 
plete, one would want to use induction to replace “and so forth” verbiage. 
Do so for practice with induction. 


For nonzero a,b,c, prove that if a and c are coprime, and likewise b and c 
are coprime, then ab and c are coprime. (Hint: use the Bezout identity.) 


If gcd(a, b) = d and k > 0 is an integer, prove a formula for gcd(ka, kb). 


You probably know the Fibonacci numbers 1, 1, 2,3,5,8,---, where fr4+2 = 
fnti + fn and we number as f; =1, fo =1. Try applying the Euclidean 
algorithm to a pair of consecutive Fibonacci numbers? As a function or 
formula of n, how long does it take? (For a more general approach see 
[E.2.1, Exercises 1.17-1.19].) 


Try the above exercise again, but with a variant of the Fibonacci numbers 
where fn42 = fn41 +2fn. This would start 1,1,3,5,11,21,---. 


Prove the second piece of Proposition 2.4.10 that if a@ and b are coprime, 
and if a | bc, then a | c. (Hint: use the Bezout identity again. Later 
you will have the opportunity to prove this with more powerful tools; see 
Exercise 6.6.6.) 
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20. Find examples that contradict the conclusions of Proposition 2.4.10 if a 
and 6 are not coprime (i.e. share a factor greater than 1). 


21. Verify that gcd(a, b) = gcd(—a, —b). (Contributed by Shawn Feng.) 
Exercise Group. The next two exercises consider a related concept to 
relatively prime. 


22. We discussed relatively prime numbers in this chapter. Write down 
your own definition of a prime number. Then compare it with the 
book, a few internet sources, or some other authoritative source. 
Should 1 be considered prime? What about —1? 


23. Search books and/or the Internet and find at least three different 
proofs that there is no largest prime number. (Ours, Theorem 6.2.1, 
is the oldest one we know of.) You don’t have to understand all the 
details; they should be fairly different from each other, though. Do 
any of the proofs generate all primes in order? 


Summary: Basic Integer Division 
Here are some of the main results of this chapter. 
1. The Division Algorithm is a foundational result. 


e We use it immediately to prove a well-known fact in Proposi- 
tion 2.1.4. 

e Note that the proof in Subsection 2.1.2 uses the Well-Ordering Prin- 
ciple. 


2. We review Common Divisors and the greatest common divisor, introduc- 
ing its characterization in Theorem 2.2.4. 


3. The Euclidean algorithm is foundational for this task; see Example 2.3.1 
for a good example. 


4. Then we use the previous section’s work to prove the Bezout identity. 


e We do several examples. 


e Importantly, we use this notion to introduce the key concept of 
Relatively Prime, and prove some facts about this concept. 


Finally, we have Exercises. 
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Chapter 3 


From Linear Equations to Geom- 
etry 


So far, we have mostly investigated topics that will seem familiar even to the 
high school student; for instance, the gcd shows up in adding fractions with 
unequal denominators. 

What makes number theory so interesting is that even a slight change in the 
questions we ask, or the way in which we approach them, can yield completely 
unexpected insights. 

In this section, we will begin this process by going from the simple ques- 
tions we started with into more subtle ones, largely motivated by a surprising 
connection with geometry. 


3.1 Linear Diophantine Equations 


The first goal for this chapter is to completely solve all linear Diophantine 
equations (of two variables'). This is the question of finding solutions x, y € Z 
of equations of the generic form 


ax + by = c for given a,b,c € Z. 


Historical remark 3.1.1 Diophantine and his equations. These equa- 
tions have been studied since the late Roman era, most notably by the (Greek 
speaking) mathematician Diophantus?, from whom we derive their name, 
though we know little else about him. One of the most notable things about 
Diophantus’ work is that it incorporates a proto-algebra which begins to use 
certain Greek letters for an unknown — an advance which, unfortunately, did 
not go anywhere for over a millenium. 

While Diophantus studied much more complicated equations as well (as 
we will see), methods for solving equations like 62 + 4y = 2 were pursued 
throughout antiquity and the medieval period — see Historical remark 2.4.7. 

There are several main cases involved in the solution, as we see in the 
following theorem. 


ISystems of equations with several variables have a very long pedigree in nearly every 
culture we have documentation from; see Exercise 3.6.10 for just one exercise, and see 
[E.5.3, Chapter 6] for some interesting historical examples, particularly the last couple. 

2www-history.mcs.st-and.ac.uk/Biographies/Diophantus. html 
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Theorem 3.1.2 Solutions of Linear Diophantine Equations. Given 
integers a,b,c, we wish to find all integer solutions x,y to ax + by =. 

Let d = gcd(a, b), unless a = b = 0 in which case let d = 0. We will consider 
cases by ease of generating solutions. 


1. When c is not a multiple of d (including if c 4 d = 0), there is no 
solution. 


2. When a or b is zero (but not both) and the nonzero one divides c, there 
are infinitely many solutions that require little work to obtain. 


3. When a,b £ 0 and c = d, there are infinitely many solutions, but you 
will need to first obtain one solution in order to generate the others. 


4. When a,b £ 0 and c ts a nontrivial multiple of d, there are infinitely 
many solutions that are easiest to generate by means of a solution to 
ax + by =d. 

Proof. The details are in the following subsections. 


1. When c is not a multiple of d: Subsection 3.1.1 

2. When a or b is zero: Subsection 3.1.2 

3. When c = d: Subsection 3.1.3 

4. When c is a nontrivial multiple of d: Subsection 3.1.4 


You should definitely follow the steps with specific simple numbers to see how 
each proof works. Examples 3.1.3 and 3.1.4 are good models. a 


3.1.1 If c is not a multiple of gcd(a, b) 


When d # 0, our previous theorems say that solving ax + by = c is impossible. 
Can you see why? For instance, try it out with a= 6, b=9, and c= 5. 

Reading the statement of Theorem 3.1.2 carefully shows that this case 
includes the situation where a = 0 = b but c # 0. It is also an easy exercise 
to show this is impossible. You can provide full details of all these things in 
Exercise 3.6.8. Don’t forget the division algorithm! 


3.1.2 If a or b is zero 


Suppose b = 0 — in which case gcd(a, b) = a. (Try a = 55 as an example.) 
Then we are just solving ax = c, so the equation is true because we already 
assumed that d=a|c. All pairs (£,y) with integer y are solutions. 


If a = 0 the answer is analogous; write it down for yourself as practice! 


3.1.3 If c = gcd(a, b) 


Suppose a,b # 0 and © actually is the gcd of a and b ... then there is some 
work to do. Follow along with a = 60, b = 42, and c = 6 if you wish. 

Your first step should be to get that gcd d via the Euclidean algorithm. 
Then you will be able to go backwards (i.e. using the Bezout identity 2.4.1) to 
get one solution (29, yo). That is important, since now at least one aag+byg = c 
is known. 

The next step is the last one; write down the entire solution set: 


b 
=a + sn, y=yo— Gn forne Z! 
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There are three comments to make to finish the proof. 


e First, look at the structure of the solutions. The constants a and b have 
switched their ‘affiliation’ from x and y to y and x. Also note that x and 
y have + involved. It doesn’t really matter which is which (switch —n 
for n to see why), but if they have the same sign it is wrong. (When in 
doubt, try something and then check to see if the answers are right.) 


e It’s easy to check that any particular solution works. 


7 +b (yo — Sn) = OO a iy ee 
a\ Xo Pid Yo 7 = aro A Yo F 


and axp + byo = c by hypothesis. 


e Why does this give all solutions? First note that since the only common 
divisors of a and b are divisors of d, the integers 5 and © must be relatively 


prime. 


Now pick another solution x = 2’, y = y’, and let’s show it has the desired 
form. Start with 
ax! + by’ =c = axq + byo 


and gather terms so that 


Since & divides the right side, it divides the left side as well. Now we use 


Proposition 2.4.10 and the observation in the previous paragraph to see 


u must divide the x’ — x9 factor of the left-hand side, so that there exists 


an integer k such that 


b 
x’ — a9 = k—, which means x’ = 29 + k-, 
d d 
which is exactly what we just said was the form of all solutions. 


Example 3.1.3 An easy example: 6x + 4y = 2. Trial and error tells us 
that 6a + 4y = 2 can be solved with xp = 1, yo = —1. Thus the full answer is 


4 6 


which we may rewrite as 


xr=1+2n,y=—-1-3n,neZ. 


3.1.4 If c is a nontrivial multiple of the gcd 


Finally, what if cis not the greatest common divisor but we still have solutions 
because d | c? (Follow along in Example 3.1.4 if you wish.) 


e First, we can write c = dm, where again d is the greatest common divisor. 


e In Subsection 3.1.3 we just saw that there must be a solution for az+by = 
d. Take any solution (20, yo) to this equation. 
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e By hypothesis, d = azo + byo. Now multiply this by m to obtain 
c= dm =axzgm t+ byom = a(xom) + b(yom) 
which shows « = x9m,y = yom is a solution to the original equation 
ax + by =c. 


e Finally, the surprise is that the full solution has the same form as in 
Subsection 3.1.3: 


b a 


qm y = yom — on 


L=XM+ 


It is easy to check and the proof is very similar to the case c = d (see 
Exercise 3.6.9). Intuitively, the reason you don’t need the m in the 
fractions is because they will just cancel anyway. 


Example 3.1.4 Try to do 15a — 2ly = 6, a slightly harder one. (Hint: d = 3; 
what are c and d? 


3.2 Geometry of Equations 


But just proving things are true and using them isn’t enough. Why is the 
theorem true, intuitively? I believe the right way to approach this is with 
geometry, as in the following figure. Then try out the interactive cell below to 
see how things change with different coefficients. 


104 


-104 


Figure 3.2.1 Solutions to 3x2 + 2y = 10 with x,y < 10 


@interact 
def _(a=slider(-10,10,1,6),b=slider(-10,10,1,4), 
c=slider(-20,20,1,2),viewsize=slider(3,20,1,5)): 
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p = plot(-(a/b)*xt+c/b,-viewsize,viewsize, 
plot_points=200) 
Lattice_pts=[Li,j] for i in [-viewsize..viewsize] for j 
in [-viewsize..viewsize]] 
plot_lattice_pts = points(lattice_pts,rgbcolor=(0,0,0), 
pointsize=2) 
if mod(c,gcd(a,b))==0: 
line_pts = [coords for coords in lattice_pts if 
axcoords[0]+b*coords[1]==c] 
if Line_pts==[]: 
plot_line_pts = Graphics() 
else: 
plot_line_pts = 
points(line_pts,rgbcolor=(0,0,1), 
pointsize=20) 
pretty_print (html ("Showing solutions _to_$%sxt+%sy=%sS$_ 
in this. viewing. window"%(str(a),str(b),str(c)))) 
show(p+plot_lattice_ptst+plot_line_pts, 
figsize=[5,5], xmin=-viewsize , xmax=viewsize, 
ymin=-viewsize , ymax=viewsize) 


else: 
pretty_print(html ("The _gcd_of $%s$_and_$%s$_ isi $%s$,_ 
which.does,.not._divide,, 
$%s$,"%(str(a),str(b),str(gcd(a,b)),str(c)))) 
pretty_print(html("so ino _solutions to, 
$%sxt%sy=%s$"%(str(a),str(b),str(c)))) 
show(p+plot_lattice_pts, 
figsize=[5,5], xmin=-viewsize , xmax=viewsize, 
ymin=-viewsize , ymax=viewsize) 


The little gray dots in the graphic above are called the integer lattice; 
this is the collection of all the intersections of the lines y = m,x = n for all 
integers m,n. There are many mathematical lattices (many quite intimately 
connected to number theory), but we will focus on this one in this text. 


Definition 3.2.2 The integer lattice is the set of points (m,n) for m,n € Z. 
% 
In the graphic, for instance (—2,3) is probably visible; on the other hand, 
the point (—1, 1/2) should not have a little dot, because it doesn’t have integer 
values. 
This is a good occasion to remind the reader of some familiar terms and 
notation. 
Definition 3.2.3 We consider any ratio of integers - with q # 0 to be a 
rational number, with equivalent ratios such as 4 = 3 identified as in school 
mathematics®. The set of all rationals is denoted Q. If a (real, R) number is 
not writeable in these terms, it is called an irrational number. © 
To return to the lattice, since az + by = c may be thought of as a line (in 


fact, the line 
a c 
oe 
with slope —¢), we now have a completely different interpretation of the most 
basic number theory question there is, the linear Diophantine equation. It is 


simply asking, “When (for what a, b, c combinations) does the line hit this 


3That this is meaningful can be made rigorous using equivalence classes, as we will do 
with modular arithmetic in Proposition 4.3.2, but that is outside the scope of this course. 
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lattice? If it does, can you tell me all intersections?” If you play around with 
the sliders you will quickly see that things work out just as promised in the 
theorems. 

But let’s go a little deeper. There are three interesting insights we can get. 


e First, Theorem 3.1.2 now expresses a very mysterious geometric idea, 
depending on whether 
ged (a, b) | c 


If so, then this line hits lots of the lattice points; if not, the line somehow 
slides between every single one of them! You can check this by keeping 
a,b the same and varying c in the interact above. 


e Secondly, it makes the proof of why Theorem 3.1.2 gets all of the answers 
much clearer. If you have one answer (for instance, (1,—1)) and go right 
by the run and down by the rise in ¢ (our example was a = 6,b = 4), 
you hit another solution (perhaps here (—3,5)) since it’s still all integers 
and the slope was the line’s slope. 


But wait, couldn’t there be points in between? Sure. So make ¢ into 


lowest terms (e.g. 3), which would be ae And this is the ‘smallest’ rise 
over run that works to keep you on the line and keep you on integer 
points. 


e Third, it can help clarify the role of the solution which the Bezout identity 
(extended Euclidean algorithm) gives for az+by = c. Namely, as pointed 
out in in a 2013 American Mathematical Monthly article by S. A. Rankin 
[E.7.21], the “solution provided .. lies nearest to the origin.” Try the 
interactive cell at the beginning of this subsection to convince yourself of 
this! 


Although we won’t pursue it, there is a question which this formulation 
in an online text brings up. Namely, given that the ‘line’s in question are 
themselves only pixellated approximations whose coordinates may not satisfy 
ax + by = c, what is the connection between the computer graphics and the 
number theory? See How to Guard an Art Gallery [E.6.7|], Chapter 4, for an 
accessible take on this* from a number-theoretic viewpoint, as well as Exer- 
cise 3.6.23. 


3.3 Positive Integer Lattice Points 


Now that we have the geometric viewpoint, here is a more subtle question: 


Question 3.3.1 Assume there exists a solution (hence infinitely many) to 
ax + by = c. How many such solution pairs (2, y) have x and y both positive? 


This is similar to the conductor question. It is closely related to integer 
programming, something with industrial applications. 


4 As well as several other topics in this text! But you’ll have to read it to find out which 
ones. 
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Figure 3.3.2 Positive solutions to 3x + 2y = 10 
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@interact 

def _(a=slider(1,20,1,1), b=slider(1,20,1,1), 
c=slider(1,20,1,4)): 
ym = c/b + 1 
xm = c/a + 1 
p = plot(-(a/b)*xt+c/b,-1,xm, plot_points = 200) 
Lattice_pts = [[i,j] for i in [@..xm] for j in [0..ym]] 
plot_lattice_pts = points(lattice_pts,rgbcolor=(0,0,0), 

pointsize=2) 

if mod(c,gcd(a,b))==0: 

Line_pts = [coords for coords in lattice_pts if 
(coords[@]>®) and (coords[1]>0) and 
(axcoords[0]+b*coords[1]==c)] 

if len(line_pts)==0: 
pretty_print(html( 'Solutions,to. 

$%sxt%sy=%s$: '%(str(a),str(b),str(c)))) 


aiteailale ys) 
show(ptplot_lattice_pts, figsize = [5,5], xmin 
@, xmax = xm, ymin = @, ymax = ym) 
else: 
plot_line_pts = points(line_pts, rgbcolor = 
(@,0,1), pointsize=20) 
pretty_print(html('Solutions_to., 
$%sxt%sy=%s$: '%(str(a),str(b),str(c)))) 
pretty_print (html ( 'Number_of_positive_lattice. 
points_=_' + str(len(line_pts)))) 
show(ptplot_lattice_ptstplot_line_pts, figsize 
[5,5], xmin = @, xmax = xm, ymin = @, ymax 
ym) 
else: 
pretty_print(html('Solutions_tow 
$%sxt%sy=%s$: '%(str(a),str(b),str(c)))) 
pretty_print (html ('No_positive_lattice_points at. 


pretty_print(html( 'No positive lattice points, 
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all!')) 
show(p+plot_lattice_pts, figsize = [5,5], xmin = Q, 
xmax = xm, ymin = @, ymax = ym) 


Let’s explore this. How many such points are there in the following cases? 
Draw pictures by hand, or use the interact above. 


ext+y=4,r+y=5,r+y=6,... 
e Qea+y=4, 2a +y=5, 2e+y=6,... 
© Qa4+2y=4, 2x+2y =5, 2x +2y=6,... 


e 3a+y=4, 3a+y=5, 3a+y=6,... 


Can you get any good conjectures? 


3.3.1 Solution ideas 


If you think about the question a little more carefully together with the picture, 
you may realize that we are really asking about how many integer lattice points 
lie between the intercepts. So one way to think about an answer would involve 
the distance between solutions. 

To be concrete, let’s assume that the equation is az+by = c, and gcd(a, b) = 
1. Then, using our technique from last time, from the solution (9, yo) we get 
a new solution (a9 + b, yo — a), so the distance between any two solutions is, 
by the Pythagorean Theorem, 


V[(@o + 8) — 20]? + [(yo — a) + yo)? = Va? + B2. 


Our strategy is to ask: 


e How many times does that distance fit between the intercepts of the line? 


Does that strategy make sense? It doesn’t give an exact answer, but should 
give a good ballpark estimate. 
Let’s calculate these things. You may want to follow it a = 3, b=2,c=4. 


£ 


¢ The intercepts are ¢ 


and ;, respectively. 


e Using the Pythagorean Theorem again, we see that the whole length 


available is 
(5) +6) =gver® 


e The ratio of this total length and the length between solutions is thus 
a 


That’s a nice pat answer. There are two problems with it, though! 


& 


1. There is no guarantee that -> is an integer! In fact, it usually won’t be. 
For instance, with 2x + 3y = 10, oa = 1.67. So should the number of 
points be bigger than or less than this? 


2. Secondly, even so it’s not clear what the precise connection between <> 


and the actual number of points is. 27+ 3y = 5 has one, and 27+ 3y = 7 
has one, but 2% + 3y = 6 doesn’t. Yet | is about equal to one for all 
three of these. In fact, the number of points is thus not even monotone 


increasing with respect to c increasing, which is rather counterintuitive. 


We will have to deal with each of these situations. 
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3.3.2 Toward the full solution 
We can deal with each of these problems. To do so, we introduce a new function: 


Definition 3.3.3 Greatest integer function. The greatest integer function 
(also called the floor function) is the function which takes a real number x and 
returns the largest integer below it (or equal to it). We notate it |a]. ?) 


Example 3.3.4 A few examples should suffice to understand it: 


[1.5] =1, [1] =1, [1.99] =1, [0.99] =0,|-.01| = -1. 


Now let’s use this to rectify our problems. 


1. To take care of the integer problem, we will just consider n = |S|, the 
c 


greatest integer function applied to =. 

2. Secondly, we simply recognize that there isn’t a nice formula. On average, 
we should expect n lengths between integer points along the line segment 
in question (and hence as many as n + 1 lattice points, since a partition 
of n intervals has n + 1 endpoints associated to it). 


Rather than give a general formula, we examine individual cases to show 
what to expect. This applet can help supplement trying it by hand. 


@interact 
def _(c=[5..12]): 


a= 2 
b = 3 
ym = c/b + 1 
xm = c/a + 1 


p = plot(-(a/b)*xt+c/b,-1,xm, plot_points = 200) 
Lattice_pts = [Li,j] for i in [0..xm] for j in [0..ym]] 
plot_lattice_pts = 
points(lattice_pts ,rgbcolor=(0,0,0) ,pointsize=2) 
if mod(c,gcd(a,b))==0: 
line_pts = [coords for coords in lattice_pts if 
(coords[@]>@) and (coords[1]>0) and 
(axcoords[@]+bxcoords[1]==c) ] 
if lLen(line_pts)==0: 
pretty_print(html('Solutions_to. 
$%sxt%sy=%s$: '%(str(a),str(b),str(c)))) 
pretty_print (html ('No_positive_lattice_points at. 
ay LY y)) 
show(ptplot_lattice_pts, figsize = [5,5], xmin = 
@, xmax = xm, ymin = @, ymax = ym) 
else: 
plot_line_pts = points(line_pts, rgbcolor = 
(@,9,1),pointsize=20) 
pretty_print(html('Solutions_to, 
$%sxt%sy=%s$: '%(str(a),str(b),str(c)))) 
pretty_print(html('Number_of positive lattice. 


points_=_' + str(len(line_pts)))) 

show(ptplot_lattice_ptst+plot_line_pts, figsize = 
[5,5], xmin = @, xmax = xm, ymin = Q, ymax = 
ym) 


else: 
pretty_print (html ('Solutions to. 
$%sxt%sy=%s$: '%(str(a),str(b),str(c)))) 
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pretty_print (html ('No positive _lattice_points at. 
all! ')) 

show(p+plot_lattice_pts, figsize = [5,5], xmin = Q, 
xmax = xm, ymin = @, ymax = ym) 


Let’s focus on the case where a,b > 0 are relatively prime, such as in the 
graphic with 2% + 3y = c for various c. Naturally, if c < 6 in this specific 
example, then n = |S| = 0, so one might not expect many points. What 


ab 
about in general? 


1. The easiest case is when just one of the intercepts is a lattice point. 
Beginning at that point, there is definitely room for the full n lengths to 
appear, and you’re guaranteed to get n lattice points, because we just 
said the other intercept isn’t a lattice point, so the nth one must appear 
before that point. So the formula is just plain old 


Las] 

n=|—}. 

ab 

This will happen (where n = 1) with 2x+3y = 8 (or 9 or 10), for instance. 


2. If neither c/a nor c/b is an integer, then you could get n or n+ 1 lattice 
points. There’s no nice formula beyond this, and often examples will 
be like 2x + 3y = 7 with just one lattice point as ‘expected’. When the 
extra point ‘fits’ is in examples like the case 2x + 3y = 11, where we have 
in | in 


34 —| 34] very close to one, and you do get |+4| +1 = 2 positive lattice 


points here. 


3. Finally, it’s also possible for ‘not enough’ lattice points to fit; for example, 
2x + 3y = 12 jumps back down to | 33 | —1=1 points! This situation 
(not reaching n points) can occur when both the z- and y-intercepts 
actually are lattice points, because the intercepts by definition do not 
have positive coordinates. So if c/a and c/b are both integers, then we 


get precisely 


lattice points. 


As a side note, the number of points not being a monotone nonincreasing 
function of c should always be expected when c transitions to being a multiple 
of ab, such as also from 2x + 3y = 5 to 2x + 3y = 6. In fact, since the closest 
solution to the origin of az + by = —1 must be no more than one half the 
usual distance Va? + b? away (cf. also [E.7.21]), all (positive) solutions of 
ax + by = kab will yield (positive) solutions to az + by = kab — 1, as will one 
of the intercepts. See Exercise 3.6.24 to fill in the details. 

The excellent book The Geometry of Numbers |E.4.16, Section 2.2] gives 
many more details. For instance, if gcd(a, b) 4 1, it is not too hard to show that 
any such line with respect to lattice points is the same as a line a’x + b'y = cl 
for which ged(a’, b’) = 1. Which line would that be? 


3.4 Pythagorean Triples 


3.4.1 Definition 


There are a lot of other interesting questions that one can ask about pure 
integers, and polynomial equations they might satisfy (so-called Diophantine 
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equations). However, answering many of those questions will prove challenging 
without additional tools, so we will have to take a detour soon. But one 
such question is truly ancient, and worth exploring more in this chapter, as a 
representative of questions involving quadratic terms. 

The question we will examine is also quite geometric. We just used the 
Pythagorean Theorem above, but you’ll note that we didn’t really care whether 
the hypotenuse was an integer there. Well, when is it? More precisely: 


Question 3.4.1 When are all three sides of a right triangle integers? 


Definition 3.4.2 We call a triple of integers x,y,z such that x? + y? = 2? 


a 
Pythagorean triple. © 
There isn’t necessarily evidence that Pythagoras thought this way about 
them. However, Euclid certainly did®, and so will we. For that matter, we 
should also think of them as 2, y, z that fit on the quadratic curve 27+ y? = z?, 
given z ahead of time. 
Let’s try this out for a little bit — on paper or with this applet. When do 
we get a triple? (Keep in mind that we will always expect the triple (z,0, z) 
and (0, 2,2) where 0? + z? = z?, but that’s not really what we are interested 
in.) 


@interact 
def _(z=(2,[1..100])): 
f(x, y)=x*2+y*%2-z%*2 


max = Zz 
p = implicit_plot(f,(x,-1,max),(y,-1,max),plot_points = 
200) 


Lattice_pts = [Li,j] for i in [0..max] for j in [0..max]] 
plot_lattice_pts = 
points(lattice_pts ,rgbcolor=(0,0,0) ,pointsize=2) 
curve_pts = [coords for coords in lattice_pts if 
f (coords[@], coords[1]) ==] 
if len(curve_pts)==0: 
show(p+plot_lattice_pts, figsize = [5,5], 
aspect_ratio=1) 
else: 
plot_curve_pts = points(curve_pts, rgbcolor = 
(@,0,1),pointsize=20) 
show(p+plot_lattice_ptst+plot_curve_pts, figsize = 
[5,5], aspect_ratio=1) 


3.4.2 Characterizing Pythagorean triples 


When exploring, it can seem quite unpredictable for which z there exists a 
Pythagorean triple! (We’ll return to that question later.) Let’s see what triples 
are possible overall. 


3.4.2.1 Preliminaries 


First, it turns out we really only need to worry about the case when 2, y, z are 
mutually relatively prime (Definition 2.4.9). 


Definition 3.4.3 A Pythagorean triple with x,y,z mutually relatively prime 
is called a primitive Pythagorean triple. © 


5aleph@.clarku.edu/~djoyce/java/elements/bookX/propX29. html 
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Proposition 3.4.4 Any Pythagorean triple with two numbers sharing a factor 
can be reduced to a primitive triple. 
Proof. If a = 2’'a and y = y’a, for instance, then 


ii +y’ = (a")?a? ae (y’)?a? = Pa 


which means that a? | 2”, and hence that a | z as well. The other cases are 
similar. (One can prove the last statement with the gcd and Bezout as well, 
but I trust you believe it for now. See below in Proposition 3.7.1.) a 

So let’s consider just the case of primitive triples. In just a little while we 
will discover we have the proof of a result, Theorem 3.4.6. 

We can start with very elementary considerations of even and odd. By the 
previous proposition, x and y can’t both be even. 

I claim they can’t both be odd, either. For if they were, we would have 
vx =2k+1 and y = 22+ 1 for some integers k, @, and then 


(2k +1)? + (2041)? =4(k? +0 +k+0)4+2 


But this contradicts Proposition 2.1.4 with respect to the remainder of a perfect 
square when divided by four. 

So we may assume without loss of generality that x is odd and y is even, 
(which means z is odd). 


3.4.2.2 An intricate argument 


We have now reduced our investigation to the following case: we assume that 
gcd(a,y,z) = 1, that x,z are odd, and that y is even. Now we will do a 
somewhat intricate, but familiar, type of argument about factorization and 
divisibility. 

Let’s rewrite our situation as 


y2 = 22-2. 


The right-hand side factors as 
2-2 =(z-—2)(z+2). 


Certainly z— x and z+ 2 are both even, so that z-— x = 2m and z+ a4 = 2n 
for integer m,n. But since their product is a square (y”), then that product 
2m -2n = 4m is also a perfect square. Since y is even, y = 27 for some j € Z 
and y? = 477, so mn = j? is a perfect square. 

Let’s look at these mysterious factors m = 45% and n = 24#. Are they 
relatively prime? Well, if they shared a factor, then x =n—mand z=m+n 


also share that factor. But gcd(,z) = 1, so there are no such factors and 


As a result, not only do we have j? = mn, but actually m and n are relatively 
prime! 

At this point we need what may seem to be an intuitive fact about squares 
and division; if coprime integers make a square when multiplied, then they are 
each a perfect square. (See Proposition 3.7.2.) Som = p? and n = q? for some 
integers (obviously coprime) p and gq. 

This clearly implies that 7? = p?q?, so y = 2pq. In addition, if we go back 
to the definitions of m,n above, we obtain z — 2 = 2p? and z +2 = 2q’. 
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3.4.2.3. The punch line 


Now we can put everything together. We begin with a useful definition. 


Definition 3.4.5 We say two integers p,q have opposite parity if one is even 
and the other is odd, and we say they have the same parity otherwise. © 


Theorem 3.4.6 Characterization of primitive Pythagorean triples. 
For a primitive triple x,y,z, where x is odd, there exist integers p,q such that 


z=pt+q,c=q¢ —p, and y = 2pq. 


Further, p and q must have opposite parity as well as be coprime. 


Algorithm 3.4.7 We can find all primitive Pythagorean triples by finding 
coprime integers p and q which have opposite parity, and then using the for- 
mula in Theorem 8.4.6. We can obtain all Pythagorean triples by multiplying 
primitive triples by an integer greater than one. 

It’s really worth trying to find these by hand; it gives one a very good sense 
of how this all works. 

Of course, you could generate some by computer as well ... 


n=10 
Generators=[(p,q) for p in range(1,n) for q in range(p+1,n) 
if (gcd(p,q)==1) and not (mod(p,2)==mod(q,2))] 
for pairs in Generators: 
x = pairs[1]*2-pairs[@]*2; y = 2*xpairs[@]*pairs[1]; z = 
pairs[@]*2+pairs[1]*2 
print('%s.squared_plus.%s_squared_is.%s_squared.-. 
TS (OX 5 WP 72 3 8 OE iarZ 2) 5) 


3 squared plus 4 squared is 5 squared - True 
15 squared plus 8 squared is 17 squared - True 


15 squared plus 112 squared is 113 squared - True 
17 squared plus 144 squared is 145 squared - True 


Remark 3.4.8 One can find many infinite subfamilies of Pythagorean triples. 
A nice brief article by Roger Nelsen {E.7.18] shows that there are infinitely 
many Pythagorean triples giving nearly isosceles triangles (where the smaller 
sides are just one unit different). What families can you find? 

Similarly, there are other ways to get the entire family of Pythagorean 
triples. Theorem 4 of [E.7.42] generates primitive triples via pairs a,b of odd 
coprime positive integers; see Exercise 3.6.25. 


3.4.3 Areas of Pythagorean triangles 


3.4.3.1 Which areas are possible? 


Historically, one of the big questions one could ask about such Pythagorean 
integer triangles was about its area. For primitive ones, the legs must have 
opposite parity (do you remember why?), so the areas will be integers. (For 
ones which are not primitive, the sides are multiples of sides with opposite 
parity, so they are certainly also going to have an integer area.) 

So what integers work? You all know one such triangle with area 6, and 
it should be clear that ones with area 1 and 2 can’t work (because the sides 
would be too small and because 2, 1 doesn’t lead to a triple); can you find ones 
with other areas? 
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n=10 
Generators=[(p,q) for p in range(1,n) for q in range(p+1,n) 
if (gcd(p,q)==1) and not (mod(p,2)==mod(q,2))] 
for pairs in Generators: 
x = pairs[1]*2-pairs[0]*2; y = 2*pairs[@]*pairs[1]; z = 
pairs[@]*2+pairs[1]*2 
print('The_primitive triple .%s_gives_a_triangle_of area, 
ISS "COW 5D) 3 YE) ») 


The primitive triple (3, 4, 5) gives a triangle of area 6 
The primitive triple (15, 8, 17) gives a triangle of area 60 


The primitive triple (15, 112, 113) gives a triangle of 
area 840 

The primitive triple (17, 144, 145) gives a triangle of 
area 1224 


It is worth asking why there are no odd numbers in the list so far. In fact, 
we can prove quite a bit about these things. 

Remember that in a primitive triple, 2 and y can be written as x = q? — p” 
while y = 2pq, for relatively prime opposite parity gq > p. Then the area must 
be 

pala? — p”) = pa(a + p)(q— Pp). 


So can the area be odd? The following proposition helps answer this (Exer- 
cise 3.6.15) and many other questions. 


Proposition 3.4.9 In a primitive Pythagorean triple given by the formula in 
Theorem 3.4.6, the area of the corresponding triangle is pq(q?—p*). In addition, 
the four factors of the area 


pa(a + p)(q—p) 


must all be relatively prime to each other. 
Proof. We already know that p and q are coprime, and that this is the correct 
formula for the area. 
The factors p and p+q must also share no factors, since any factor they share 
is shared by (p+ q) — p= q, but gcd(p,q) = 1. The same argument will work 
in showing that p and q — p are, as well as q and either sum. 
If g+p and q—p share a factor, since they are odd it must be odd, and it must 
be a factor of their sum and difference 2q and 2p. Since the putative factor is 
odd, it is coprime to 2, and so we can use Proposition 2.4.10 to say that it is 
a factor of both p and gq, which is impossible unless said factor is 1. | 
So one could analyze a number to see if it is possible to write as a product 
of four relatively prime integers as a starting point. For example, the only 
way to write 30 in such a way (assuming no more than one of them is 1) is 
30 = 2-3-5-1. Since g+p must be the biggest, we must set q+ p = 5. Quickly 
one can see that g = 3,p = 2 works with this, so there is such a triangle. (A 
quick exercise is to determine the sides of this triangle.) See Exercise 3.6.16. 
Trying to see if an integer is the area of a Pythagorean triangle turns out to 
be a very deep unsolved problem. This linked news update from the American 
Institute of Mathematics® gives some background on the congruent number 
problem, which asks the related question of which Pythagorean triangles with 
rational side lengths give integer areas. This linked page’ in particular is 
interesting from our present point of view. 


Swww. aimath.org/news/congruentnumbers/ 
“www. aimath. org/news/congruentnumbers/ecconnection. html 
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3.4.3.2 Which areas are square? 


But we can ask another question, which led Fermat (see Historical remark 13.0.4) 
to some of his initial investigations into this theory. 


Question 3.4.10 When is the area of a Pythagorean triple triangle a perfect 
square? 


@interact 
def _(n=20): 

Generators=[(p,q) for p in range(1,n) for q in 
range(pt1,n) if (gcd(p,q)==1) and not 
(mod(p,2)==mod(q,2))] 

list = [] 

for pairs in Generators: 

x = pairs[1]*2-pairs[Q]*2; y = 2*pairs[Q]*pairs[1]; 
Zz = pairs[@]*2+pairs[1]‘*2 
if is_square(x*y/2): 
pretty_print(html('The_ primitive triple. 
$%s ,%S ,4S$_ gives ia_triangle_of.square area, 
$%S$'%(X,Y,Z,x*Y/2))) 
List. append((x,y,z)) 

if not list: 
pretty_print(html(r"No triangles of. square_area up. 

to_$p,q\leq.%s$!"%(n,))) 


You'll notice by the empty output that we don’t seem to be getting a lot 
of these. In fact, none. What would we need to do to investigate this? 

In the previous section, we noted that each of the factors in the area, pg(q?— 
p*) = pq(q+p)(q—p), are relatively prime to each other. So if the area is also 
a perfect square, then since the factors are coprime, we use Proposition 3.7.2 
again to see they themselves are all perfect squares! 

Now we will do something very clever. It is a proof strategy, similar to 
something the Greeks used occasionally, which Fermat used for many of his 
proofs, called infinite descent. We are going to take that (hypothetical) 
triangle, and produce a triangle with strictly smaller sides but otherwise with 
the same properties — including integer sides and square area! That means 
we could apply the same argument to our new triangle, and then the next 
one ... But the Well-Ordering Principle (Axiom 1.2.1) won’t allow infinite sets 
of positive integers less than a certain number — which yields the name of 
the proof technique! Then (by way of contradiction) the original triangle was 
impossible to begin with. 

So let’s make that smaller triangle! 


Proposition 3.4.11 If a primitive Pythagorean triangle with sides x,y, z, 
where the hypotenuse is z, has area a perfect square, we can create another 
one of strictly smaller hypotenuse length. 

Proof. We use the same notation as in Proposition 3.4.9. We know that q+ p 
and q — p are (odd) squares. Call them u? and v7. Note we can write u and 
vas 4% + 4" and 4" — 4 (the terms of which are integers since u and v 
have the same parity). 

Letting a = “4" and b = 45%, we have that ¢+p = (a+b)? and g—p = (a—b)?. 
Then a little algebra (do it slowly if you don’t see it right away) shows that 
q =a? +b? and p = 2ab. These are both squares, so a? +b? = q = c? (!), which 
an a triangle with area 4 = 242 = 2 another perfect square (do you see 
why?). 


Now let’s compare c and z. We have z = q? +p? = (2)? +p? = ct +p?, so 
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that unless p = 0, c is strictly less than z. But p = 0 doesn’t give a triangle at 
all! So we have our strictly smaller triangle satisfying the same properties. 1 


Corollary 3.4.12 No Pythagorean triangles can have area a perfect square. 
Proof. If so, we can use the previous proposition infinitely often and violate 
Axiom 1.2.1, a contradiction. | 


Corollary 3.4.13 No nonzero difference of nonzero perfect fourth powers can 
be a perfect square. That is, 


cannot be solved in positive integers. 
Proof. It suffices to consider u,v coprime. In the previous proposition and 
corollary, we really showed that if q—p, q+p,q, p are all perfect squares (coming 
from the area of the triangle) then this leads to a strictly smaller (and hence 
impossible, by infinite descent) set with the same property, since the area of 
the smaller triangle is a product of coprime squares of the same form. If we let 
p =u? and q = v’, then we are in precisely this situation, as long as q— p,q+p 
are coprime. 
The only difference is that here even if p,q are coprime, it’s possible that both 
are odd, so that g —p,q+p only have the same (even) parity. However (viz. 
[E.2.16, Lemma 7.7.3]), 2 is the only divisor they can share without passing a 
common divisor on to p,q, so that we still have q+ p = v? + u? = 2f? and 
q—p =v" —u? = 2g? where f,g themselves coprime. Then some quick algebra 
shows v7 = f7 +97 and vu? = f? — g?, so that the set f7,97, f? + 97, f? —g’ 
are all perfect squares, an impossibility. a 
In Exercise 3.6.17 you will use this to prove the famous first case of Fermat’s 
Last Theorem: There are no three positive integers x, y, z such that 


at yt = 2. 


See also Subsection 14.2.2. 


See [E.5.9] and nearly any generalist math journal for a lot more information 
on Pythagorean triples; the search is the reward! 


3.5 Surprises in Integer Equations 


This chapter has discussed linear and quadratic Diophantine equations. As 
you can see, even relatively simple questions become much harder once you 
have to restrict yourself to integer solutions. And doing it without any more 
tools becomes increasingly unwieldy. 

But there is one final example of a question we can at least touch on. 
Recall that Pythagorean triples come, at their heart, from the observation 
that 3? + 42 = 5?. This is an interesting coincidence of powers involving 
nearby numbers, in this case perfect squares. So too, we can notice that 3? 
and 2° are only one apart, and 5? and 3° are only two units apart; a perfect 
square and a perfect cube are close together. 

As usual, we can think of this graphically, using the integer lattice. 
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Figure 3.5.1 Solutions to 2° = y? — 1 
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@interact 
def _(k=(-1,[-25..5])): 
HCY y= Vie2ia x3 tk 
p = implicit_plot(f,(x,-4,4),(y,-8,8),plot_points = 


plot_lattice_pts = 
points(lattice_pts ,rgbcolor=(0,0,0) ,pointsize=2) 
curve_pts = [coords for coords in lattice_pts if 
f (coords[@], coords[1])==0] 
if len(curve_pts)==0: 
show(p+plot_lattice_pts, figsize = [5,5], 
aspect_ratio=1) 
else: 
plot_curve_pts = points(curve_pts, rgbcolor = 
(@,0,1),pointsize=20) 
show(p+plot_lattice_ptstplot_curve_pts, figsize 
ES, S11) 
if k>0: 


viewing _window"%(k,))) 
if k<Q: 


viewing window"%(-k,))) 
if k==0: 


viewing window") ) 


Lattice_pts = [Li,j] for i in [-4..4] for j in [-8..8]] 


pretty_print (html ("Solutions of i $x*3=y*2+%s$_in this. 


pretty_print (html ("Solutions of _$x*3=y*2-%s$_in this. 


pretty_print (html ("Solutions of _$x*3=y*2$_in this. 


The general form x° = y? + k in the preceding interact can be known both 
as as a Bachet equation or Mordell equation. We will use the latter for the 
general form and reserve the former only for the special case k = 2, where a 


cube and square are two apart. 


Historical remark 3.5.2 Bachet de Méziriac. We will learn more about 
Mordell in Section 15.3. André Weil in [E.5.8] describes “Claude Gaspard 
Bachet, sieur de Méziriac” as a “country gentleman ... no mathematician [who 
somehow] developed an interest in mathematical recreations”, but who in the 
end provided “a reliable text of Diophantus along with a mathematically sound 


translation and commentary.” 
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Just like triangles of Pythagorean triples, this equation is connected to 
incredibly deep mathematics. The Bachet/Mordell equation connects directly 
to objects called elliptic curves. Given their importance in cryptography 
and theory, there is plenty of reason to study such equations; for instance, 
see [E.4.19, Appendix A] for the connection between congruent numbers (and 
hence Pythagorean triples) and elliptic curves. Studying them will take us too 
far afield, unfortunately. 

However, some equations of the form 2° = y? + k are solvable by more ele- 
mentary means. Here are some brief examples to whet your appetite; another 
such is Proposition 7.6.3. See Section 15.3 for more details on this indepen- 
dently interesting topic. 


Historical remark 3.5.3. Bachet equation. We already saw that for 
k = 2 we get the solution 25+ 2 = 27. The history is interesting; Bachet 
himself, in his translation and commentary on Diophantus, talked about finding 
rational solutions to what is now ‘his’ equation. Fermat asked the English 
mathematician John Wallis (most famous for his infinite product for 7° and for 
a nasty controversy with Thomas Hobbes?) whether there were other solutions, 
and implied there were no others. Euler proved this is the only solution, but 
using some hidden assumptions so his proof was incomplete; see Fact 15.3.5.) 


Example 3.5.4 When k = —1, Euler’s proof in 1738!° that 9 — 1 = 8 is the 
only nontrivial solution is correct, however'!. He uses the same method of 
infinite descent we saw in Proposition 3.4.11. (He even shows that there aren’t 
even any other rational number solutions to x? = y? — 1, all in the midst of a 
paper actually about demonstrating Exercise 3.6.17.) 


This is also related to a very old question which was called Catalan’s con- 
jecture, yet again related to these funny little coincidences about powers of 
nearby numbers. Try exploring the question with the Sage cell following it. 


Question 3.5.5 Catalan’s Conjecture. Eight and nine are consecutive 
perfect (nontrivial) powers. Are there any others? 


@interact 
def _(end_range=10): 
pretty_print (html("Solutions. through. numbers and _powers,, 
$%s$"%end_range) ) 
print(l(x,p,y,q) for x in range(1,end_range) for y in 
range(1,end_range) for p in range(2,end_range) for q 
in range(2,end_range) if x*p+1==y*q]) 


Historical remark 3.5.6 Catalan’s conjecture — solved. This was called 
Catalan’s conjecture because, as of 2002, the fact that there are no other such 
powers is Mihailescu’s Theorem! The history of this question goes back to the 
1200s and Levi ben Gerson. This article by Ivars Peterson!” and [E.4.18] have 
nice overviews of many important pieces of its history, and Wolfram Math- 
World?’ has an accessible introduction to the mathematics. 


8en.wikipedia. org/wiki/Wallis_product 

®www.press.uchicago. edu/ucp/books/book/chicago/S/bo03640378.htmL 

10 eulerarchive.maa.org/pages/E098.htmL 

11For a modern take, try [E.4.26, Problem 1.6 and Chapter 19]. 

12web. archive. org/web/20090219013305/http: //www.maa. org/mathLand/mathtrek_ 
Q6_24_02.htmL 

13mathworld.wolfram.com/CatalansConjecture.html 
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3.6 Exercises 


Exercise Group. For each of the following linear Diophantine equations, 
either find the form of a general solution, or show there are no integer solutions. 


1. 2la+14y = 147 2. 21lx+14y = 146 
3. 380a+47y =—-11 4. 307+ 47y = 2 
5. 4x2—-—6y=77 6. 4xr—-—6by=78 


7. Find all possible solutions to the question in Exercise 2.5.10, now that we 
have Theorem 3.1.2. 

8. Confirm all details in Subsection 3.1.1, including which theorem applies 
and the casea=b=0. 

9. Check the details and complete the proof in Subsection 3.1.4. 

10. Find all simultaneous integer solutions to the following system of equa- 
tions. (Hint: do what you would ordinarily do in high school algebra or 
linear algebra! Then finish the solution as we have done.) 


x+y +z =100 
x + 8y+50z=156 
11. Compute the number of positive solutions to the linear Diophantine equa- 


tion 62 + 9y = c for various values of c and compare to the three-case 
analysis at the end of Subsection 3.3.2. 


12. Explore the patterns in the positive integer solutions to ax + by = c situa- 
tion in Section 3.3. For sure I want you to do this for the ones I mention 
there, but try some other values of c and see if you see any broader pat- 
terns! 

13. Prove that any line ax + by = ec which hits the integer lattice but 
gcd(a,b) £1 is the same as a line a’a + b'y = c' for which ged(a’, b’) = 1, 
and explain why that means that without loss of generality Theorem 3.1.2 
doesn’t need any more explanations. 


14. Find a primitive Pythagorean triple with at least three digits for each 
side. 

15. Use Proposition 3.4.9 to prove that a Pythagorean triple triangle cannot 
have odd area. 

16. Prove that 360 cannot be the area of a primitive Pythagorean triple tri- 
angle. 


17. Find a way to prove that «+ + y* = z+ is not possible for any three 


positive integers x,y,z. (Hint: use Corollary 3.4.13; this exercise needs a 
little cleverness. ) 


18. We already saw that if x,y,z is a primitive Pythagorean triple, then ex- 
actly one of x,y is even (divisible by 2). Assume that it’s y, and then 
prove that y is divisible by 4. 


19. Under the same assumptions as in the previous problem, prove that ex- 
actly one of x, y, z is divisible by 3. (Combined with the previous exercise, 
this proves that every area of a Pythagorean triple triangle is divisible by 
6. Is it also true that exactly one of x,y, z is divisible by 5?) 

20. A Pythagorean triple satisfies x? + y? = z?. Explore patterns for triples 
of positive integers which satisfy 2? — xy + y? = z?. If Pythagorean 
triples correspond to right triangles, what sort of triangles do these triples 
correspond to? 
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21. Find a (fairly) obvious solution to the equation m" = n™ for m 4 n. Are 
there other such solutions? 
22. Show that 
ged(a,y)* = ged(2*, xy, y”) 
which we use in Proposition 3.7.2. You can try this using the set of divisors 
definition of gcd, or using the definition gcd(a, b,c) = gced(ged(a, b), c). 
23. Explore Bresenham’s algorithm in print or online. What is the connection 
to this chapter? How do non-solutions to linear Diophantine equations 
relate to actual solutions, in this context? 


24. Assume you have relatively prime integers a,b > 0 and a positive inte- 
ger k. Describe all k — 1 positive solutions to az + by = kab, and use 
Definition 2.4.1 to find k (positive) solutions to ax + by = kab — 1. 


25. Assume b > a are odd, coprime positive integers. Show that 


(F5%, 00, aa") is a primitive Pythagorean triple, and that all such 


triples are generated this way. (See Remark 3.4.8.) 


26. Cultures across Eurasia have variants of the ‘Problem of the Hundred Fowl’ 
(see among others [E.5.10, Chapter 15], [E.5.1, p. 176], and [E.5.11, Sec- 
tion 1.1.1.3]). This one is from Abu Kamil!* (about 900 AD). Can you 
find all solutions with positive integers? What if you generalize the prices 
of the birds? (Finding a general solution was attempted — unsuccessfully 
— by Chinese mathematicians for generations.) 

Suppose ducks cost five coins each, chickens one coin each, but one 
coin buys twenty sparrows. If you spend one hundred coins to purchase 
one hundred birds, how many of each did you buy? 


3.7 Two facts from the gcd 


Here are two facts that seem really obvious but do need proofs. All can 
be done just with the gcd, using no facts about primes from Chapter 6 as 
would typically be done. Kudos go to users Math Gems!° and coffeemath!® at 
math.stackexchange.com!” for most of these clever arguments. See this ques- 
tion!® for Proposition 3.7.1 and this question!? for Proposition 3.7.2. 


Proposition 3.7.1 When perfect squares divide each other. For inte- 
gers a,z tt is true that 
a? |2@—alz 

Proof. First, let d = gcd(a,z). Then we can write 27 = a? - k for some integer 
k, and immediately write 

(2')?d? = (a’)?d?k 
for some integers z’ and a’, by definition of gcd. (That is, z = z’d and a= a’'d. 
Also note that z’,a’ are now relatively prime; it is not hard to prove using the 
techniques of the previous chapter, or see Exercise 6.6.7.) 
Cancelling the d? (yes, we do assume this property of integers) yields 


(2)? = (a')?k. 


147 [E.5.3, Section 6-4] a similar example of Abu Kamil’s with five unknowns is given, 
which he claimed had exactly 2676 solutions in positive integers; today such computations 
are of high interest in computational geometry on polytopes. 

l5math. stackexchange. com/users/23500/math- gems 

16math. stackexchange. com/users/30316/coffeemath 

17math. stackexchange.com/users/ 

18math. stackexchange.com/questions/286099/ 

19math. stackexchange. com/questions/286101/ 
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Since gcd(a’, 2’) = 1, we have a’x+2'y = 1 for some x, y € Z; now we substitute 
for lin a’-1-< (!) to get 


a'(a’a+z'y)a+z'y=1 


Now we have that a2? + z'(a’xy + y) = 1, so that gcd((a’)?, 2’) = 1 as well. 
But of course a’ | (z’)?. Clearly if a positive number is a divisor, but their 
greatest common divisor is 1, then that number is going to have to be 1 by 
definition of divisors. So a’ = 1. (If a’ was negative, the same argument for 
—a’ shows —a’ = 1, so really a’ = +1.) 

Hence a = a’d = +d, which is a divisor of z, we have the desired result. a 


Proposition 3.7.2 When the product of coprime numbers is a square. 
If we have integers m,n,j such that mn = j? and gcd(m,n) = 1, then m and 
n are also both perfect squares. 

Proof. First, we will need a general fact about gcds: 


gcd(x,y)? = ged(x”, xy, y’) 
See Exercise 3.6.22. 
We know that 1 = gcd(m,n) = gcd(m,n, 7), so 
m= m- ged(m,n, j) = ged(m?,mn,mj) = ged(m?, j?, mj) 
Now we use the fact, so that 
m = gcd(m, j)?. 


That’s a perfect square. 
The same argument with n and j yields n = ged(n, j)?. a 


(For more ‘traditional’ proofs, see Section 6.4.) 


Summary: From Linear Equations to Geometry 


This chapter contains a lot of interesting results about equations involving 
integers, including a number of geometric interpretations. 


1. In Solutions of Linear Diophantine Equations we solve all equations of the 


form ax+by = cin integers. There are several cases, the most important 
being where c = gcd(a, b) in Subsection 3.1.3. 


2. The next section reinterprets these results gometrically, using the integer 
lattice. 


3. Then we try to ask for solutions to az + by = c where x,y are both 
positive, continuing our geometric intuition, in Section 3.3. 


4. Moving to equations with quadratic terms, we introduce the notion of 
Pythagorean triples. 


e We prove the Characterization of primitive Pythagorean triples. 


e We also examine the possible areas of integer-sided right triangles in 
Subsection 3.4.3, including the historically very important question 
of whether such areas can themselves be a perfect square. 


5. In the last main section, we start examining further interesting questions 
such as the Bachet equation and Catalan’s Conjecture. 


Finally, after a good selection of Exercises, we have proofs of two facts about 
perfect squares using just the machinery available to us at this time. 
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Chapter 4 


First Steps with Congruence 


Our next big goal is a better notion of how to deal with divisibility and re- 
mainders, one we are all familiar with. That is the notion of congruence! 

We will begin by reviewing that notion, and start asking the kinds of ques- 
tions that one will be able to ask with this notion. 


4.1 Introduction to Congruence 


Let’s start by a little calculation. What is the remainder of 25 when divided 
by 6? 


25 % 6 


In general, the command x % m computes “az modulo m”, which is to say 
the remainder of « when you divide by m. 
An alternate way to do this is with the command mod(x,m). 


mod(25,6) 


1 


In a moment this will be more desirable, but for now it is less so, because 
it creates a different kind of Sage object. 

Because of the division algorithm, we know that there is a unique such 
remainder. If we call it r (so that r = x % m), then 0 < r < m, which is 
very important. However, lots and lots of different numbers can have the same 
remainder: 


[ x % 6 for x in [1, 7, 13, 19, 25, -5, -11, 6001, -17]] 


E15 Vy Vege Tye Tg > 1g hye ign Td 


(See Sage note 4.6.2 for this type of list construction.) 

In mathematics, what we often do in such a situation where structure is 
shared is connect things with a relation. 

A relation is a very general notion, and basically it exists once you define it; 
however, we will not pursue this further. Our relation will be called congru- 
ence, and it is massively important. It is also relatively new! We essentially 
use the same definitions and notation that the great 19th-century German 
mathematician C.F. Gauss (see Historical remark 14.1.3) came up with just 
two centuries ago. 
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Definition 4.1.1 Congruence. We say that a number a is congruent to b 
(another number) modulo n, or 


a= b (mod n), 


precisely if n | (a — 6). We call n, normally a positive integer greater than one, 
the modulus. The noun form of the relationship is called congruence. © 
Often we can prove a small helping statement, usually called a lemma. 


Lemma 4.1.2 Congruence-Remainder. Saying a = b (mod n) is exactly 
the same thing as saying a and b leave the same remainder when divided by n. 
Proof. We can sketch the proof. It is a good exercise (see Exercise 4.7.15) to 
fill in the details. 


e Write a= nqg+r and b= nq +r’. (Why is this possible, what are the 
various symbols?) Then there are two steps (why do they suffice?) 


¢ First, ifr =r’ then there is a k such that a—b = nk, which means a = b 
(mod n). (Why?) 


e The other direction is showing if a — b = nk for some k € Z, then r = 1’. 
This is a little harder; try thinking about getting the remainders on one 
side, and what r 4 r’ would imply with respect to n. 


Example 4.1.3 In our case, saying 25 = 1 = —5 (mod 6) is the same as saying 
25=4-6+1land1=0-6+1 and -5=—-1-6+1. 

It’s fun to use congruence as a conceptual assistant. Here are an example 
of our previous thinking recast using congruence. 


Example 4.1.4 Recall the fact about remainders when dividing by four, Propo- 
sition 2.1.4. This is just saying that the only possibilities are 


x? = 0 or 1 (mod 4) 


Could you try to use this idea to think of possible last (decimal) digits of a 

perfect square? Which modulus would be helpful? (See Exercise 4.7.11.) 
What about cubes; what remainders are possible modulo 4? What last 

digits are possible? 


4.2 Going Modulo First 


Okay, that’s all fun. But we need power, too. Here’s an example of such power. 
Even though I’m not physically present, I can do amazing computations! Let 
me compute 21000900000 (mod 3)! T’ll do it instantaneously. 

Ready for the answer? It’s 1! 

Perhaps you don’t believe an absent author. We can check it with Sage: 


atime 2°1000000000 % 3 


Sage note 4.2.1 Timing your work. In a Sage worksheet, putting %time 
before a command tells you how long it took. Putting %timeit instead runs the 
command many times and gives a ‘best of’ timing. (This does not universally 
work in the embedded cells in the web version of this book.) 
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Hmm, but that took more than a few milliseconds — strange that I could 
do it so fast! 


Sage note 4.2.2 Numbers too big for a computer. If I add one more 
zero, it will throw a very nasty error, like MemoryError: failed to allocate 
1250000024 bytes, because things are too big. We can quickly go beyond the 
bounds of what our computers can do in number theory! 

Now consider that I did this huge computation instantaneously in my head. 
Surely I must be full of brains, like the Scarecrow in L. Frank Baum’s Oz 
books? 

Of course, the reason is not that I am clever, but that congruence can be 
turned into arithmetic! Unlike the Wizard, I will give away my secret. I just 
used the following useful property. 


Fact 4.2.3 Ifa =b (mod n), then a” = b™ (mod n) no matter how huge m 

1s. 

Proof. See Exercises 4.7.7 and 4.7.8. | 
Now I do my first congruence computation: 


2 = —1 (mod 3) and (—1)'9900000 = 1, 


the latter like all even powers of negative one. Ta-dah! 

What I’ve done is first think of the original number as in the congruence, 
and then taken its power. 

Sage can verify this approach is much faster, and even for much bigger 
powers. Here we will need to use the mod(x,m) syntax: 


print (mod(2,3)%*1000000000) 
print (mod (2,3) *1000000000000000000000000000030) 


Even the presumptively very, very big latter computation should be as fast 
as your internet connection. 


Sage note 4.2.4 Give things names. We can use the print function as 
above with print( mod(2,3)%*1000000000 ) to show multiple computations in 
a cell. Then again, it only prints them to the output, does not save them, and 
typing print() a lot can get annoying. 

So instead, we can assign our ‘modulo integer’ a name, like b, and then use 
it to compute. This makes it easy to do lots of interesting tests. 


b=mod (2000, 31) 
b,b*1000,b%*2000,b*3000,b*4000 


(16, 1, 1,7, 1) 
The command in the last line is what prints out in any Sage cell. 


Sage note 4.2.5 Making tuples. In this case, we put commas between 
things so that all of the stuff in the last row prints out. The output is in 
parentheses because the commas create a tuple (a special Python way of making 
a list with certain nice properties). 


Sage note 4.2.6 Types matter. What was computed above is not a trick; I 
definitely couldn’t do 2000!°, or even 1619°°, in my head. How does Sage do 
it? The answer lies in the kind of thing b really is, which confirms that Sage 
is using modular numbers, not normal integers. 
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b=mod (2000, 31) 
b, type(b) 


(16, <type 
"sage.rings.finite_rings.integer_mod.IntegerMod_int '>) 


In Python, we can ask for the type of anything.In this case, we asked to 
output b and then its type, which is definitely not an ordinary integer, and can 
be manipulated much more efficiently. 

The preceding notes were a lot of computer business, and especially the 
last one may have seemed too technical. But if you just skipped it, consider 
the main point; if the computer thinks it’s a good idea to just think of the 
remainder before you do any arithmetic, maybe we should too. 


4.3 Properties of Congruence 


There are two main sets of propositions that make arithmetic with congruences 
possible. The proofs are not hard, and you may skip them on a first reading. 


Proposition 4.3.1 Congruence is an equivalence relation. Congruence 
is reflexive, symmetric, and transitive, which are the conditions for it to 
be an equivalence relation. 


¢ For anya€ Z, a=a (mod n). 

e Ifa=b (mod n), then b=a (mod n). 

e If it happens that both a = b and b=c (mod n), then a= c (mod n) as 
well. 


See any intro-to-proof text for more background. For our purposes, this 
means all the things you know are true about equality are also true about 
congruence (with a particular modulus n picked, of course). 

Proof. We will show each of the properties, leaving some pieces to the reader 
(Exercise 4.7.9). 


e (Reflexive) For any a € Z, a=a (mod n). 
o The definition of congruence means we want to show n | (a — a). 
o But a—a=0. So we claim n | 0. 
o Any questions? 
e (Symmetric) If a = b (mod n), then b = a (mod n). 
o For the reader! 


¢ (Transitive) If it happens that both a= b and b=c (mod n), then a=c 
(mod n) as well. 


o The definition of congruence means we want to show if n | (a — b) 
and n | (b—c), then n | (a —c) as well. 


o We use the definitions to see a — b = nk and b—c = né for some 
k,l €Z. 


o Add these two equations to get a—c = n(k + £), which is the 
definition of n | (a —c). 
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Proposition 4.3.2 Congruence arithmetic is well-defined. Addition 
and multiplication modulo n are well-defined. That is, ifa=candb=d 
(modulo some fixed modulus n), then both of these congruences hold: 


1l.a+b=cc+d 


2. ab=cd 
Proof. Let a = c and b = d (modulo some fixed n). We will prove that 
a+b=c+d and then leave the proof that ab = cd the reader in Exercise 4.7.10. 


e There must exist k and @ such that a=c+kn and b=d-+ én. 


© Soat+tb=c+kn+d+n=(ce+d)+(k+é)n. 


¢e Soa+bandc+d must have the same remainder modulo n. 
e By definition thena+b=c+d. 


a 

The impact of the previous result is that if I want to do a computation, I 

can pick any number with the same remainder modulo n, and the computation 

will get the same answer. (Hopefully I pick an easier number to work with!) 
Here is an example. 


Example 4.3.3 We collate examples of both propositions here. As an example 
of what Proposition 4.3.1 implies, 2 = 5 (mod n) is the same thing as saying 
5 = 2 (mod n). Then transitivity (and a careful use of contradiction) would 
imply that if 246 (mod n), then 5 £6 (mod n) either. 

More interesting are examples of Proposition 4.3.2. A basic one is to replace 
computing 2-2-2-2 modulo 3 by the choice —1-—1-—1-—1 instead, getting the 
same answer (modulo 3). More impressive might be, instead of adding 16 +15 
modulo 17, to compute instead —1 + (—2) = —3 in the same modulus. 


It won’t always be that clear-cut, but that is the general idea. 


4.4 Equivalence classes 


Let’s make the previous discussion a bit more rigorous by formally breaking 
up Z into disjoint subsets; by Proposition 4.3.2 we can pick any element of a 
subset for computations. 


Definition 4.4.1 Assume throughout that we have fixed a modulus n. 
e We call any number congruent to a a residue of a. 
e We call the collection of all residues of a the equivalence class of a. 
e We denote this class by the notation 


[a] = {all numbers congruent to a modulo n} 


(Sometimes this is notated [a],,, but the modulus is nearly always evident 
from the context.) 


% 


Example 4.4.2 For instance, the equivalence class we began with in Sec- 
tion 4.1 is of numbers congruent to 1 modulo 6, which is the set 


[1] = {1, 7, 13, 19, 25, —5, —11, 6001, .. .} 
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perhaps better written as 


{1+6n|ne€Z} = [ll]. 


These congruence classes are an example of a more general construction. 


Fact 4.4.3 Any set (not just Z) that has an equivalence relation on it can be 
broken up into disjoint subsets called equivalence classes. It can be useful 
to consider these classes as elements of a set of all such classes. Such a set of 
subsets is called a partition. 
Proof. We consider this to be background; see any intro-to-proof text. a 
For the relation of congruence modulo n, there are only finitely many classes 
(since there are only n possible remainders in the division algorithm), which is 
particularly convenient. The point is you can choose your favorite number in 
an equivalence class to serve as a representative for all of them, including for 
the purposes of basic arithmetic (by Proposition 4.3.2). Let’s briefly redo part 
of Example 4.4.4 from this perspective. 


Example 4.4.4 To compute 2.2-2-2 modulo 3, we can note that 2 = —1 and 
write 
2-2-2-2=-1--1--1--l=1 
which is that 16 = 1 (mod 3). 
Let’s solve the ‘magic trick’ at the beginning of Section 4.2 using this con- 
cept in a slightly different way. 


? 


91000000000 _— (a7 eeenonene -_ 4500000000 = 1500000000 —1mod (3). 


Example 4.4.5 Here is something which is not a legal manipulation. 


91000000000 = 91 =2. 


Even though 1000000000 = 1 modulo 3, clearly the end result is wrong, because 
2! £1 (mod 3), which we have now seen twice. 

In general, we have only seen reduction modulo n in the base of a power; 
nothing is said about the exponent! (Later, in Section 10.5, we’ll see how to 
do reduction in the exponent under controlled circumstances — with a different 
modulus, using Euler’s Theorem.) 


As you saw above, knowing the ‘right’ residue can be very helpful. Because 
of this, we make two sets of them for general use. 


Definition 4.4.6 We call a set of integers with precisely one for each equiva- 
lence class a complete residue system or complete set of residues for a 
given modulus. 

Usually, we just use the ‘normal’ remainders; this is called the set of least 
nonnegative residues. 

Sometimes we use the set of least absolute residues, the collection of 


representatives of each class which are closest to zero. © 
Example 4.4.7 For n = 6, the set of least nonnegative residues 
is {0,1,2,3,4,5}, representing the set of equivalence classes 


{[0], [1], [2], [3], [4], [5]}. They are easy to think of and understand. 

In the same case the least absolute residues are {—2, —1,0,1, 2,3}, standing 
in respectively for {[4], [5], [0], [1], [2], [3]}. We used these residues (for n = 3) 
in Example 4.4.4. 
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4.5 Why modular arithmetic matters 


This has been fun and all. But why are we creating so much machinery? There 
are two reasons. 

The first is practical. Simply put, modular arithmetic makes it much easier 
to solve certain otherwise very difficult problems about integers. The reason is 
that we can reduce problems about the (infinitely many) integers to checking 
whether things are possible when we look at the (finitely many) cases modulo 
n. For instance, we can prove things like these statements without inequalities, 
calculus, or graphs: 


¢ “The polynomial x° — x +2 has no integer roots”. (See Exercise 9.6.4) 
¢ “The curve x? = y? — 7 has no lattice points”. (See Fact 15.3.3.) 


The second reason for doing modular arithmetic is theoretical. We get a 
new number system! (See Chapter 8.) It’s a number system which has new 
problems, new solutions, and new things to explore. And that’s what we’ll be 
doing from now on. 


4.5.1 Starting to see further 


In order to accomplish all these goals, we will take some time learning how to 
do such computations. Here are two generic practical rules for using modular 
arithmetic. 


¢ First off, always first reduce modulo n, then do your arithmetic (add, 
multiply, exponentiate). We have seen lots of examples of this. 


e Secondly, always use the most convenient residue (recall Definition 4.4.6) 
of a number modulo n. 


Example 4.5.1 For example, to add [22] + [21] modulo 23 it might be smarter 
to use the residues —1 € [22] and —2 € [21]. The answer [—3] is of course the 
same as [22 + 21] = [43] = [20] modulo 23. 


mod (22 ,23)+mod (21 ,23)==mod(-3, 23) 


True 


Sage note 4.5.2 Checking equality. Use the double equals sign == to check 
if two numerical expressions are equal, not just = (which assigns a value to a 
variable). 

Here is a more involved example, which avoids the big numbers which would 
otherwise occur to do a computation without assistance. 


Example 4.5.3 Let’s calculate 47° (mod 6). First we will use least nonnegative 
residues; write the justification for each step in the margin if you have a print 
copy! 


429 (47)"° 161° 410 (ae? 16° 4° (ae aA 4? 4 4-4 4 


On the other hand, we can avoid using numbers of absolute value bigger than 
five if we carefully select least absolute residues where appropriate! Recall that 
4 = —2 (mod 6): 
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There are a few things to be aware of when doing this, of course. To remind 
you of a very important such caveat, recall Example 4.4.5; with exponentiation, 
you can only replace the base with something in the same congruence class. 
Using the language of Proposition 4.3.2, we say that [a]” is well-defined, but 
there is no guarantee that al”! makes any sense. 


Example 4.5.4 Just to make sure you get this, on your own compare 


2? (mod 5), 7° (mod 5), and 2° (mod 5), 7° (mod 5). 


The second pair is quite different from the first pair. 


4.5.2 Taking powers 


As one example of how modular arithmetic might matter a bit, let’s examine 
the following algorithm for taking ridiculously high powers of numbers (modulo 
n). We first need the following interesting fact. 


Fact 4.5.5 For any integer a: 


1 
1.a =a? 


2a = (a?)? 
3. a = (a? 


In general, 


n n— 2 
fo: («? : 
That is to say, each “power of a to a power of 2” is the square of the previous 


“nower of a to the previous power of 2”. 
Proof. What does a?” even mean? By definition, 


gn — gn-l Q= gn-l 4 ont, 


so a? is the same as 


2 
n-1l ,on-1 n-1 n-1 wot 
pe gees a (? ) 


Example 4.5.6 In this case, it will be easier to do examples before stating the 
algorithm. To compute x?°, first we see that 16 is the highest power of 2 less 
than 20. 


e Compute x? modulo n. 


* Square that for (x2)? = 22 = x4 (modulo n). 


. 3 4 
e Then square twice more for x? = 2° and x? = x'®; we reduce modulo 


n at each point. 
Now write z7° as x to a sum of powers of 2; 


41 52 4 2 
2 2 
20 ibt4 ve +2 x 


x “2x 


Then do this final multiplication modulo n as well. You might want to try it 
to see you get the same thing. 
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Example 4.5.7 Now let’s get really explicit, and calculate 27° (mod 11). First, 
23 = 244274241, so 273 = 27.2?” . 92.2, 

Then get the powers of 2 needed: 
2? = 4 (mod 11), (2?)” = 4? =5 (mod 11), 


(2*)* = 5? =3 (mod 11), and (2%)” = 3? = 9 (mod 11) 


So we get, as a computation one can do completely without a calculator, 


Pe 6 2a 0.5 ed eS 18 OS TS = =a = 8 (med 14) 


mod(2,11)*23 


8 


Algorithm 4.5.8 In general, we can compute x«* modulo n: 


1. Write the exponent k = >. k;i2', where each kj = 0 or 1. (This is 
called the binary representation of k.) 


2. Compute x”, x*, x8, and so forth as above, each time reducing modulo n. 


3. Multiply Ih, hid! together as in the examples above. Obviously, if 
k; = 0 (such as for i = 3 in the x?° example) you skip it, as it just 
contributes one to the product. 


Remark 4.5.9 Those interested in efficiency should note that this requires 
roughly two times the number of binary digits of your number operations, or 
about 2log,(n) operations, as opposed to normal powers which might require n 
operations; in addition, you only deal with numbers at most size n?, as opposed 
to gigantic ones, when you mod out after each step, so it requires very little 
memory. 


4.6 Toward Congruences 


Recall a question touched on in Example 4.1.4. 


Question 4.6.1 What are the possible last digits of a perfect cube? 

We can think of this more systematically now. For instance, if the last digit 
of x > 0 is 3, then = 10m +3 for some integer m. That is, [a] = [3] (mod 
10). So the cube would look like 


a? = (10m + 3)? = 1000m? + 900m? + 270m + 27 = 10( stuff +2)47 


This would presumably have last digit 7. 
We can ask Sage to answer this for all possible last digits very quickly: 


[mod(i,10)*3 for i in [@..9]] 


Sage note 4.6.2 List comprehensions. 
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This programming structure is known as a list comprehension. Think of 
it as set builder notation 


{i> (mod 10) | 0 <i < 10} 


That’s the set of all cubes modulo 10, generated by i from @ to 9. (Sage 
replaces [@..9] with the list of integers from 0 to 9.) 

If you check, what this is doing is getting the (least nonnegative) residue 
modulo 10 of the cube of every possible last digit. Notice that we also get every 
possible last digit. 

It’s possible to think of this more generally. Since we just said the last digit 
is all we cared about, we could think of this as answering a related kind of 
question. For all last digits d, is there an x such that the following works? 


x? = d (mod 10) 


Definition 4.6.3 Any (integer) equation with congruence in place of equality 
is called a congruence. © 


As a result, the previous calculation says that there is a solution to the 
congruence x? = d (mod 10) for all possible d. Another way to say this is that 
every number (equivalence class) modulo 10 has a cube root. For instance, the 
cube root of [7] is [3]. 

This is definitely not true in Z; the usual cube root of 7 (where 7  [7]) is 
not even rational! This exemplifies the following fact, which one could consider 
a driving force in number theory research. 


Fact 4.6.4 Things which are false for the integers might be true in modular 
arithmetic. 

However, a sort of converse is also worth thinking about, where I will leave 
“things” vague for now. 


Fact 4.6.5 Things which are true for the integers are normally true in modular 
arithmetic. 
Now let’s try the same question again, but with a different modulus. 


[mod(i,4)*3 for i in [@..3]] 


[@, 1, 0, 3] 


This seems to imply that every equivalence class modulo 4 “has a cube root” 
except [2]. 

This is suggestive, so maybe we can refine our generalized question. 
Question 4.6.6 Given a modulus n and an integer d, identify whether there 


are solutions to 
a? = d (mod n). 


Or, for what moduli does d have a cube root modulo n? 

Once we’ve opened things up to one such congruence, the sky’s the limit. 
For instance, let’s take a slightly more complex quadratic. Over the integers, 
there are only two solutions to x? = 2, the familiar ¢ = 0 and z = 1. This 
leads to another natural question we can ask in modular arithmetic. 


Question 4.6.7 What are solutions to the congruence 


zg”? = x (mod n) 


for different moduli n? 
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Sage can help us explore this sort of question, such as in the following 
applet. 


@interact 
def _(n=(2,[0..100])): 
List=[x for x in [0..n-1] if (mod(x,n)==mod(x,n)*2)] 
pretty_print(html(r"The_solutions, tothe. congruence. 
$x *2\equivix$_(mod_$%s$)"%(n,))) 
pretty_print(html("are_"+str(list))) 


Often, it seems we get the same answers as over the integers. But not 
always! Can you try to conjecture for which n we do get the same answer? 
(See Exercise 4.7.19.) 

We begin to see that there are two aspects of solving congruences, which 
will come up again and again for us. 


e Solving a given congruence 


¢ Figuring out for which moduli a congruence has solutions (or how many 
or ...) 


Much of the course will return to these ideas; sooner, in Chapters 5 and 7, and 
later in Chapter 17. 


4.7 Exercises 


1. Give the sets of least absolute residues and least nonnegative residues for 
n= 21. 

2. Prove that 13 divides 145° +1 and 431 divides 24° — 1 without a computer 
(but definitely using congruence). 


Exercise Group. It is definitely worth while gaining intuition for modular 
manipulation by doing a bunch of examples. 

3. Compute 7* (mod 11) as in Subsection 4.5.2 without using Sage or 
anything that can actually do modular arithmetic. (You should never 
have to compute a number bigger than (11—1)? = 100, so it shouldn’t 
be too traumatic.) 

Repeat Exercise 4.7.3, but with 67° (mod 11). 

5. Repeat Exercise 4.7.3, but with 67° (mod 12). Why is this one easier? 

6. Make up an exercise like Exercise 4.7.3 and dare a friend in class to 
solve it. (Make sure you can solve it before doing so!) 

7. Use the properties of congruence (in Proposition 4.3.2) or the definition 
to show that if a= b (mod n), then a3 = b® (mod n). 

8. Use the properties of congruence (in Proposition 4.3.2, not the definition) 
and induction to show that if a = b (mod n), then a™ = b™ (mod n) for 
any positive m. 

9. Finish the details of proving Proposition 4.3.1, especially the second part 
(symmetric). 

10. Finish the details of proving Proposition 4.3.2. 


11. Find and prove what the possible last decimal digits are for a perfect 
square. 
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12. Prove that if the sum of digits of a number is divisible by 3, then so is the 
number. (Hint: Write 225 as 2-10? + 2-10+5, and consider each part 
modulo 3.) 


13. Prove that if the sum of digits of a number is divisible by 9, then so is the 
number. 


14. For which positive integers m is 27 = 5 (mod m)? 


15. Complete the proof of Lemma 4.1.2 that having the same remainder when 
divided by n is the same as being congruent modulo n. 


Exercise Group. Consider Example 4.5.4 in these three extensions. 


16. 


17. 


18. 


Find some a and n such that a” (mod 5) equals a"t° (mod 5), where 
a#0,landn 0. 

Try to find some a and n such that a” (mod 5) equals a”*° (mod 5), 
where a #0,1 andn #0. 


Find some a and n such that a” (mod 6) equals a"t® (mod 6), where 
a £0,1 and n £0. Then try to find an example where they are not 
equal. 


19. Explore, using the interact after Question 4.6.7 or ‘by hand’, for exactly 
which moduli n the only solutions to z? = x (mod n) are x = [0] and 
x= [ll]. 


Summary: First Steps with Congruence 


This chapter introduces the extremely important notion of congruence. 


1. In Definition 4.1.1 we define a = b (mod n), and immediately note in 
Lemma 4.1.2 that it is the same thing as when two numbers have the 
same remainder. 


2. Before examining more formal properties of congruence, we use Sage to 
confirm that it is much easier to be Going Modulo First when you try to 
compute in a congruence. 


3. We must then show that Congruence is an equivalence relation and that 
Congruence arithmetic is well-defined, so that we are justified in such 
computations modulo n. 


4. Because any equivalence relation partitions its underlying set, we can talk 
about the equivalence classes involved here, and about residue systems 
that are convenient to compute with. 


5. In the next section, we then see some practicalities: 


e In various examples like 4.5.3 and 4.5.4 it becomes clear how to 
conveniently compute powers in modular arithmetic (and what you 
can’t do). 


e The next subsection then shows how to be systematic about this 
using binary numbers in Algorithm 4.5.8, including several examples. 
The key is repeated squaring, explained in Fact 4.5.5. 


6. Finally, in Section 4.6, many questions are raised that should motivate 
why we would try to explore things that are like equations, but using 
congruence (recall Definition 4.6.3). 


As always, there are plenty of computational and theoretical Exercises. 


Chapter 5 


Linear Congruences 


There are many questions one can ask of the integers, and in the preceding ma- 
terial we have already encountered many, especially those asking for solutions 
of simple equations in one or two variables. 

One can ask very similar questions (and many more) about the integers 
modulo n. So we will focus on congruences, which are simply equations modulo 
n (see Definition 4.6.3). To exemplify this, consider the following similar ideas: 


e 2x + 3y = 5 (solutions are pairs of integers) 


e 2x + 3y = 5 (mod 7) (solutions would be pairs of equivalence classes 
[x], [y] modulo 7) 


e 2x + 3y = 5 (mod n) for any particular n (solutions would be triplets 
[x], [y],m, since it would depend on n) 


Try comparing solutions to these by hand; what is similar about them, what 
is not? 

In one sense the latter problems are a big improvement in the level of 
difficulty. For instance, in the second one you just have to try x,y from 0 to 6 
(the least nonnegative residues) in the congruence 2x + 3y = 5 (mod 7). 

On the other hand, if the third congruence was modulo n = 10/°°, that 
would be less desirable, especially if the techniques for Z proved not to be 
useful with a congruence. 

Finally, if we slapped an x“ in the middle of the congruence, it might 
very hard indeed to solve quickly. So in this chapter, we will stay focused 
on the simplest case, of the analogue to linear equations, known as linear 
congruences (of one variable). This includes systems of such congruences 
(see Section 5.3). 


2 


5.1 Solving Linear Congruences 


Our first goal to completely solve all linear congruences az = b (mod n). The 
most important fact for solving them is as follows. 


Proposition 5.1.1 The linear congruence 
ax = b (mod n) 


has a solution precisely when gcd(a, n) | b. 
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Example 5.1.2 Before going on, test yourself by checking which of the follow- 
ing four congruences has a solution and which ones don’t. 


* 7x =8 (mod 15) 
¢ 6z = 8 (mod 15) 
e 7x = 8 (mod 14) 
e 62 = 8 (mod 14) 


Proof of Proposition 5.1.1. The proof is pretty straightforward, as long as we 
recall when linear Diophantine (integer) equations have solutions. 
The following are clearly equivalent: 


¢ Solutions x to ax = b (mod n) 

¢ Solutions x to n| ax —b 

¢ Solutions x to az — b = ny (for some y € Z) 
e Solutions x,y to ax — ny =b 


And we know from Theorem 3.1.2 that this final equation has solutions precisely 
when gcd(a,n) | b. EB 

Just like in linear algebra or calculus, though, it’s not enough to know 
when you have solutions; you want to actually be able to construct solutions. 
If possible, one wants to construct all solutions. In this case, we can do it. 


Proposition 5.1.3 If we can construct one solution to the linear congruence 
ax = b (mod n), we can construct all of them, and we know exactly how 
many equivalence classes (or remainders) there are of these solutions, which is 
d = gcd(a,n). 

Proof. Consider the proof of Proposition 5.1.1 above. We don’t care about 
y (other than that it exists, and it does). So if we have one solution to the 
congruence, that is the same as having a solution 20, yo to the equation ax — 
ny = b. 

But we already know what solutions to that look like, from Theorem 3.1.2. 
Looking just at the x components, the solutions from e.g. Subsection 3.1.3 
(using k since n is taken) are 


ao + ok k © Z where d= gcd(a,n). 


This argument also gives us the exact number of solutions (modulo n), because 
letting & go from 0 to d— 1 will give all different solutions. a 


Example 5.1.4 Let’s solve 
12% = 15 (mod 21). 


Here, gcd(a,n) = 3 so we will have 3 solutions (up to equivalence modulo 21), 


— 21 _ 
all separated by § = 3 = 7. 


We need one solution first. Trying by guess and check small values gives us 
¢ 12(1) =124 15, 

e 12(2)=24=3F15, 

e but 12(3) = 36 = 15 (mod 21). 
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So we may take x = 3 as our Zp. Then we add 7 a couple times (mod 21) and 
we see that x = [3], [10], [17] all work. (Or, if you prefer least absolute residues 
(recall Definition 4.4.6), then x = [3], [10], [—4] work.) 
Alternately, 
34+ 7k, keEZ 


is the general solution. 


Remark 5.1.5 The previous example well illustrates that, while there are 
infinitely many integers which may solve a congruence, we will usually only 
consider the finitely many classes of solutions (or finitely many remainders, if 
you like). However, it is easy to be sloppy and talk about one when you mean 
the other, so be cautious. 


5.2 A Strategy For the First Solution 


The previous proposition always works. However, it can be very tedious to 
find that first solution if the modulus is not small. This section is devoted to 
strategies! for simplifying a congruence so that finding such a solution is easier. 


Fact 5.2.1 Strategies that work for simplifying congruences. We can 
do two main types of simplification. First, there are two types of cancellation. 


e Ifa, b, and n all are divisible by a common divisor, we can cancel that 
divisor out (keeping in mind that we still will need our final solution to 
be modulo n). 


e Ifa and b share a common divisor which is coprime to the modulus, we 
can cancel that divisor from a,b (only). 


See Propositions 5.2.6 and 5.2.7 for precise statements and proofs. 
Secondly, there are two counterintuitive operations that may lead to a sim- 
pler congruence (using least nonnegative residues). 


e We could multiply a and b by something coprime to n. If, after reducing 
modulo n, that makes a or b smaller, then that was a good idea! 


e We can add some multiple of n to b. Again, if that happens to make a 
and (the new) b share a factor, then it was a good idea! 


These four steps may be applied in any order, though typically the first two are 
done as often as possible. See Example 5.2.5 for why coprime is necessary in 
two of the steps. 


Example 5.2.2 A big example. Let’s do a big problem exemplifying all 
the strategies; we will break it up into possible steps you might do. 


Solve 30x = 18 (mod 33). 


1. First, note that all three of the coefficients and modulus are divisible by 
3. So right away we should simplify by dividing by 3. But keep in mind 
that our final solution will need to be modulo 33, not modulo eleven! We 
should still end up with gcd(30, 33) = 3 total solutions, and if we don’t, 
we have messed up somewhere. 


2. Now we have 102 = 6 (mod 11). (Again, although this will have one 


lThe reader should note that we roughly follow [E.2.1, pp. 50-51] in this, but that an 
alternate (or supplemental?) approach using the Bezout identity is followed in texts like 
[E.2.4] or [E.2.13]. 
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solution modulo 11, we will need to get the other two solutions modulo 
33.) Since 10 and 6 are both divisible by 2, and since gcd(2, 11) = 1, we 
can divide the coefficients (not modulus) by 2 without any other muss. 


5a@ = 3 (mod 11) 


3. So take 52 = 3 (mod 11), and let’s try to replace 3 by another number 
congruent to 3 modulo 11 which would allow me to use the above steps 
again. 

e I could try 3+ 11 = 14, but that gives 
5x2 = 14 (mod 11) 


and 14 doesn’t share a divisor with 5 (from the 5z). 
e IfI try 3+ 22 = 25, giving 


5a@ = 25 (mod 11) 
then 25 does share a divisor with 5. 
4. Now I can go back and reduce 5% = 25 (mod 11) to 
x = 5 (mod 11) 
And that’s the answer! 


5. Or is it? Remember in the first step that we started modulo 33, and that 
all the answers will be equivalent modulo 11. So we see that 


a=54+11lkforkeZ 


will be the answer, which is the three equivalence classes {[5], [16], [27]}. 


Does it check out? 


[mod (30*x,33)==18 for x in [5,16,27]] 


[True, True, True] 


One final observation is that we avoided trial and error as long as possible. 
At various points we could have done so, but « = 1 and x = 2 wouldn’t have 
worked right away, and I am lazy... 


Example 5.2.3 Let’s finish the previous example again, but using the other 
possible counterintuitive strategy. That was the trick to multiply a and b by 
something which would reduce; ideally it would reduce [a] = [1]. 


e We were at 5x = 3 (mod 11). 
e Multiplying a = 5 and b =3 by 9, which is coprime to 11, gives us 


45a = 27 (mod 11). 


e This reduces to x = 5, and gives the same answer as before (provided we 
remember to get all possible answers modulo 33). 
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Example 5.2.4 Try completely solving one of the following two congruences 
(Exercise 5.6.3) on your own now, before moving on. The rest of the Exercises 
provide other interesting practice. 


e 7x = 8 (mod 15) 
e 6x = 8 (mod 14) 


Example 5.2.5 Finally, let’s see examples of using the strategies poorly. 

First, suppose 62 = 12 (mod 4). Then we could divide all terms by 2, 
yielding 32 = 6 (mod 2), and then reducing everything modulo two we obtain 
x = 0, or that the solution is all even x. If we had instead canceled the 2 from 
only the 6 = 12 portion, we would have gotten 3x = 6 (mod 4), which is 
—x =2 or x = 2 modulo four, which is only half of the true solutions. 

As a similar example, suppose we want to solve 7x = 7 (mod 12). If we 
used cancellation the solution would obviously be x = 1. Set this aside and 
instead multiply 7x and 7 by 2 in order to obtain 14% = 14 which simplifies to 
2x = 2 (mod 12), which now looks like an easy target for cancelling 2 from all 
three numbers to obtain x = 1 (mod 6), which is twice the true solutions. 

The moral of the story is that while some structure is preserved when we 
don’t stick to numbers coprime to the modulus, it’s very easy to remove or add 
spurious solutions, so it must be avoided. 


Here are formal statements and proofs of the propositions we used. 


Proposition 5.2.6 Canceling, Part I. If d 4 0, then ad = bd (mod nd) 
precisely for the same a,b,n as when a = b (mod n). 

Proof. Like many such proofs, you basically follow your nose. 

First write ad = bd (mod nd) as nd | ad — bd, or ad — bd = k(nd) for some 
k € Z. We rewrite this as d(a — b) = d(kn). 

Since d ¥ 0, asserting d(a — b) = d(kn) is equivalent to saying a — b = kn, 
which is of course by definition saying that a = b (mod n). 

Since all steps were equivalences, both statements are equivalent. a 


Proposition 5.2.7 Canceling, Part II. [f d 4 0 and gcd(d,n) = 1, then 
ad = bd (mod n) precisely for the same a,b,n as when a = b (mod n). 

Proof. We already essentially know the direction when we assume a = 0b 
from Proposition 4.3.2. [ll sketch the proof of the cancellation direction; see 
Exercise 5.6.2 and Exercise 5.6.7. 


e Use the definitions as above, starting with the ad situation. 


e You should have that n divides some stuff, which is itself a product of d 
and other stuff. 


e We had a proposition somewhere about coprimeness and division; what 
remains should yield us a = b (mod n) 


5.3 Systems of Linear Congruences 


Here are three interesting problems which may seem totally unrelated at first. 


Question 5.3.1 Can you find an answer to any or all of these by trial and 
error? 


e You have lots of volunteers at a huge campaign rally. Because you are 
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very efficient at moving them, and you want to gauge how to group them 
when dispatching them to different size venues, you line them up in rows. 
You choose to group them by fives (with one left over), by sixes (two left 
over), and by sevens (with three left over). How many helpers are there 
total? 


e You're an ancient sky watcher, and you have discovered that three heav- 
enly bodies come to the region of the sky you care about with great 
regularity. Comet 1 comes every five years, starting next year. Comet 2 
comes every six years, starting two years from now. Comet 3 comes every 
seven years, starting three years from now. When will they all come in 
the same year? 


e You like math a lot. You want to know what integers x simultaneously 
solve the following three linear congruences: 
o x =1 (mod 5) 
o x =2 (mod 6) 
o x =3 (mod 7) 


5.3.1 Introducing the Chinese Remainder Theorem 


In Section 5.2, we were able to solve any one linear congruence completely. It’s 
a good feeling. 

But we know that this is a pretty restricted result. If you’ve had a course 
in linear algebra, you’ve tried to solve big systems over the reals or complex 
numbers; sometimes in real-life operations research problems, there can be 
hundreds of thousands of linear equations to solve simultaneously! 

It turns out this is true for modular arithmetic too, especially in encryption 
standards. Can we solve a system of linear congruences? Of course, one could 
ask a computer to do it by simply checking all possibilities. 


Ciincerace Clawour=IE MA’, "mel "I, a2), W271, 768. Pes I) 
def _(a_1=(r'\(a_1\)',1), a_2=(r'\(Ca_2\)',2), 

A S=Ce VOLS) 9 435 MeCr WC)? 5S) 5 

Zar V2)". mse V3) ? ye 


try: 
answer = [] 
for i in [1..n_1*n_2*n_3]: 
if (i%n_1 == a_1) and (i%n_2 == a_2) and (i%n_3 
== a_3): 
answer. append (i) 
string] = r"<ul><li>$x\equivi%s_\text{_ (mod. 


}%s)$</li>"%(a_1,n_1) 

string2 = r"<li>$x\equivi%s_\text{_ (mod. 
TS) S</ il es Carina») 

string3 = r"<li>$x\equivi%s_\text{_ (mod. 
}%8)$</Li></ul>"%(a_3,n_3) 

pretty_print (html ("The simultaneous._solutions._to,,")) 

pretty_print (html (string1+string2+string3)) 

if len(answer)==0: 
pretty_print (html ("are_none")) 

else: 
pretty_print (html ("allihave.the_form_")) 
for ans in answer: 
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pretty_print (html ("$%s$_modulo,, 
$%s$"% (ans ,n_1*n_2*n_3))) 
except ValueError as e: 
pretty_print (html ("Make_sure_the_modulivare. 
appropriate _for.solving!")) 
pretty_print (html ("Sage_gives.the_error.message:")) 
pretty_print(html(e)) 


As one might expect, this is not the most promising solution strategy. If you 
dig into the code a bit you'll see that many cases aren’t even treated properly, 
which could be very tedious to catch. 

However, in considering systems of congruences, there is a famous theorem. 


Theorem 5.3.2 Chinese Remainder Theorem. Consider a general system 
of k (linear) congruences: 


e x =a, (mod n,) 
¢ © =ay (mod no) 
e « =ayx (mod nx) 


where all the n; are mutually coprime. In this case, we have an algorithm for 
solving the system. 

Proof. This will be done in a completely constructive fashion in Subsec- 
tion 5.4.1 and Algorithm 5.4.1. a 


Historical remark 5.3.3 Ancient Chinese work on remainders. This 
kind of simultaneous solution was apparently first considered by the Chinese 
mathematician Sun Tzu or Sun Zi”, probably about the same time as the 
late Greek mathematicians were coming up with what we now call Diophan- 
tine equations. Individual cases of such systems were considered by several 
generations of both Chinese and Indian mathematicians. A very full solution 
algorithm (see Subsection 5.5.1) was given by Qin Jiushao? in the 13th century. 
See [E.5.10, Part V] for a very comprehensive discussion. 

The name comes from the provenance, and is often abbreviated CRT. It 
is questionable whether any actual Chinese rulers used it to decide how many 
troops they had by lining them up in threes, fours, fives, etc. However, many 
of the example problems in Qin’s text mention divination, alignment of differ- 
ent calendars, and the like, so we can assume such problems were of practical 
as well as theoretical interest. Similar questions of astronomical/astrological 
importance pepper the history of mathematics. See Exercise 5.6.22 and Exer- 
cise 5.6.23. 

Finally, note that one can also go much further and do linear algebra modulo 
n, and this is a lot of what modern cryptography is about, not to mention the 
modern hard-core computational number theory for which Sage was largely 
invented. We can’t do everything in this text, but you should be aware that 
everything done in linear algebra has very interesting modulo n counterparts, 
demonstrating again this book’s theme of number theory showing the unity of 
mathematics. 


2www-groups.dcs.st-and.ac.uk/~history/Biographies/Sun_Zi.html 
3www-history.mcs.st-andrews.ac.uk/Biographies/Qin_Jiushao.html 
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5.3.2 The inverse of a number 


To do justice to the proof of Theorem 5.3.2, we need a very useful preliminary 
concept. 


Definition 5.3.4 The Inverse of a Number. The inverse of a number a 
modulo n is the least nonnegative solution of the congruence 


ax =1 (mod n). 


It is sometimes notated a~!. ©) 


Example 5.3.5 For example, the inverse of 26 modulo 31 is the least nonneg- 
ative solution of 
26x = 1 (mod 31). 


This is called the inverse because you can think of the solution as being 


equivalent to the idea of ae or 2671, in the numbers modulo n = 31. 


Note that there is not always an inverse! 


Question 5.3.6 Ponder these questions regarding inverses. 


e What connection do a and n need if we expect there to exist an inverse 
of a modulo n? 


e How many inverses modulo n should a have, assuming it has one at all? 


Example 5.3.7 As a first step, try to find inverses to all the numbers you can 
modulo 10. Then do it again modulo 11. 


The following Sage command computes the “inverse of 26 modulo 31”. 


inverse_mod (26,31) 


Sage note 5.3.8 Getting interactive Sage help. You can look for more 
information on Sage commands by using question marks. Try inverse_mod? 
and inverse_mod?? in a notebook, command line interface, or CoCalc. (This 
also should work as embedded in the web page in your text; let us know if it 
doesn’t.) 

The point is that the inverse is definitely something we can compute, just 
by solving a linear congruence. 


5.4 Using the Chinese Remainder Theorem 


We will here present a completely constructive proof of the CRT (Theorem 5.3.2). 
That is, we will not just prove it can be done, we will show how to get a solution 
to a given system of linear congruences. 

Keep in mind that this is a procedure that works. It may have a number of 
steps, but its power is not to be underestimated. After some careful examples, 
we'll see some other uses. 


5.4.1 Constructing simultaneous solutions 


Remember that we are trying to solve the system of equations « = a; (mod 
n;). It is important to confirm that all n; are coprime in pairs (or that the 
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set of moduli is mutually coprime, Definition 2.4.9). Then the following steps 
will lead to a solution. You will find basically this proof in any text; I use the 
notation in [E.2.1], while that in [E.2.4] basically uses the letter m instead of 
n. 


Algorithm 5.4.1 The following steps not only yield the solution, but mostly 
indicate the proof as well. 


1. First, let’s call the product of the moduli nyng-:-np = N. 


2. Take the quotient N/n; and call it c;. It’s sort of a “complement” to the 
ith modulus within the big product N. 


3. Now find the inverse of each c; modulo n;. That is, for each i, find a 
solution d; such that 
cid; =1 (mod ni) 


Notice that this is possible. You can’t find an inverse modulo any old 
thing! But in this case, c; is the product of a bunch of numbers, all of 
which are coprime to n;, so it is also coprime to n;, as required. 


4. For each i, multiply the three numbers a; - c; - dj. 


5. Now add all these products together to get our final answer, 


c= a,c, dy, + a2C2d2 a ancrdr. 


What remains is to verify that this works. Go back to the last two steps. 


e Let us evaluate each of the products in the penultimate step (indexed by 
i) modulo the various n;. That looks bad, but most things cancel because 
each c,; is divisible by n; (except for c; itself). 


o When iF j, the product modulo n; is thus 
a,;cjd; =0 (mod nj). 
o Otherwise we can use the definition of inverse, and the product is 
acid; =a;-1=a; (mod n;) 


e To check the final step, for each n;, we can do the entire sum modulo n,. 
The previous item shows 


x =04+0+4+-:-+a;+---+0 (mod nj). 
So the sum is definitely a simultaneous solution to all the congruences. 


Finally, any other solution x’ has to still fulfill xc’ = a; = x (mod nj), so 
n;, | x —a@ for all moduli n;. Since all n; are relatively prime to each other, 
N |a'—-2 too (ifa|candb|c and gcd(a,b) =1, then ab|c). So x’ =x (mod 
N), which means x is the only solution modulo N! 

Clearly this needs an example. 
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Example 5.4.2 A first CRT example. Let’s look at how to solve our 
original system from Question 5.3.1 using this method. First we write our 
simultaneous congruences: 


¢ x =1 (mod 5) 
° x =2 (mod 6) 
¢ x =3 (mod 7) 


We'll follow along with each of the steps in Sage. First, 11 make sure I 
know all my initial constants (printing them to verify). This is step 1. 


Not, WA, hes = 6,7 
aol, 2422, 624 = 1,258 
N = n_1*n_2*n_3 
print(n_1, n_2, n_3) 
print(a_1, a_2, a_3) 
print (N) 

5 6 7 

12:3 

210 


Next, Ill put down all the c;, the complements to the moduli, so to speak. 
Remember, c; = N/n;. This is step 2 above. 


> We2, hes = 26,7 
5 @o2,5 Bod = 1,258 


42 35 30 


Now we need to solve for the inverse of each c; modulo n;. One could do 
this by hand. For instance, 


42d, = 2d, = 1 (mod 5) yielding d, = 3, since 2-3 = 6 =1 (mod 5). 


But that is best done on homework for careful practice; in the text, we might 
as well use the power of Sage. 


d_1=inverse_mod(42,5); 
d_2=inverse_mod (35,6) ;d_3=inverse_mod (30,7) 
print (d_1,d_2,d_3) 


3.5 4 


That was step 3. Now I’ll create each of the big product numbers, as well 
as their sum, which is steps 4 and 5. 


=o M25 fied = 5,0,/ 
|, 222, 623 2 15258 
= n_1*n_2*n_3 
_l=inverse_mod (42,5); d_2=inverse_mod (35,6) ; 
d_3=inverse_mod (30,7) 
print(a_1*c_1*d_1, a_2*c_2*d_2,a_3*c_3x*d_3) 
print (a_1*c_1*d_1+a_2*c_2*d_2+a_3*c_3*d_3) 


126 350 360 
836 
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Of course, we don’t recognize 836 as our answer. But that is because the 
solution is only unique modulo N: 


Nol, MA, W3 = 5,6,/ 
N = n_1*n_2*n_3 

print (N) 
print (mod (836,N)) 


210 
206 


Now we see our friend 206, as expected if you successfully tried Ques- 
tion 5.3.1. 


Sage note 5.4.3 Printing it out. When using Sage cells, you might not 
want only the things in the last line returned to you as output. You can use 
the print function to get them to print out, as we have done in the preceding 
example 5.4.2. 


a,b,c = 1,2,3 
print (a) 
print(a,b,c) 


1 
1 2 3 
This is actually capability in Python itself, not just Sage, so if you have 

previous experience with Python (or perhaps other languages), it is very impor- 
tant to note print() is a function. That means the thing to be printed must 
be in parentheses, such as print(3). Previously (in Sage versions previous to 
9.0, and anything else based on Python 2) syntax such as print 3 was allowed, 
and experienced Sage users may need some time to adjust. If you are new to 
Sage, no worries! 


Example 5.4.4 Let’s try some more interesting moduli for an example to do 
on your own. Can you follow the template? 


¢ x =1 (mod 6) 
e x =11 (mod 35) 
e x =3 (mod 11) 


Sage can also approach this in a similar way, as we saw earlier. 


@interact (layout=[[ 'a_1','n_1'],['a_2','n_2'],['a_3', 'n_3']]) 
Gar Ca eSCe NALIN)" 1D), Ee 28Ce NCA) 9 52, 

A_S=Cr Ve)? Sy MleaCr “Wa ’ 5), 

A= Ce MLA) SO), MiS=(e “WC sei) 4 7)))) 2 


try: 
answer = [] 
for i in [1..n_1*n_2*n_3]: 
if (i%n_1 == a_1) and (i%n_2 == a_2) and (i%n_3 
== a_3): 
answer. append (i) 
string] = r"<ul><li>$x\equivi%s_\text{_ (mod. 


}%s)$</li>"%(a_1,n_1) 

string2 = r"<li>$x\equivi%s_\text{_(mod_ 
}%s)$</li>"%(a_2,n_2) 

string3 = r"<li>$x\equivi%s_\text{_ (mod. 
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}%S)$</Li></ul>"%(a_3 ,n_3) 
pretty_print (html ("The simultaneous._solutions._to,,")) 
pretty_print (html (string1+string2+string3)) 
if lLen(answer)==0: 
pretty_print (html ("are_none")) 
else: 
pretty_print (html ("allihave.the_form_")) 
for ans in answer: 
pretty_print (html ("$%s$_modulo,, 
$%s$"% (ans ,n_1*n_2*n_3))) 
except ValueError as e: 
pretty_print (html ("Make_sure_the_ moduli are. 
appropriate _for.solving!")) 
pretty_print (html ("Sage_gives.the_error._message:")) 
pretty_print (html (e)) 


5.4.2 A theoretical but highly important use of CRT 


The following proposition is an example of one of the many useful things we 
can do with the CRT. 

Proposition 5.4.5 Converting to and from coprime moduli. Suppose 
that X = Y (mod N), and N =[[m, where gcd(m;,m,;) = 1 for alli # j. 
Then we have two directions of equivalence between a congruence and a system 
of congruences. 


e Certainly if N divides X —Y, so does a factor of N, so X =Y (mod 
m,) for each of the relatively prime factors of N. Thus, solutions to the 
“big” congruence are also solutions to a system of many little ones. 


e But the CRT allows me to reverse this process. The moduli in question 
are all coprime to each other, so if we are given a solution pair (X;, Y;) 
to each of the congruences 


X; = Y; (mod m;) 
then when combined they will give one (!) solution of 


X=Y (mod N) 

As a result, any question about a congruence is really a question about 
several congruences, but with smaller moduli (indeed, simpler moduli in a 
specific sense; see Proposition 6.5.1 for a strong statement of this). We will use 
this fact again and again in the remainder of the text, and it is a huge reason 
why the Chinese Remainder Theorem is so intensely powerful. 


5.5 More Complicated Cases 


Solving linear congruences is a completely solved problem (up to computer 
power). Although one does not usually cover all extensions in an introductory 
course, the following subsections will introduce some, without full detail. 


5.5.1 Moduli which are not coprime 


What happens if, in a system of congruences, we don’t have the enviable situ- 
ation where all the n; are relatively prime? Let’s go back to the interact from 
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before one last time, with some moduli which are not pairwise coprime, and 
see if we get anything. 


@interact (layout=[[ 'a_1','n_1'],['a_2!','n_2'],['a_3', 'n_3']]) 
Ch? UG eaCr WEI WY) 51D), Bear WELW)? SD), 
ASC MEL) oI), Melee “Waal ) ’ 53), 

Z2=Ce WAN) IO, scr Cn) | 5 1D) 4 


try: 
answer = [] 
for i in [1..n_1*n_2*n_3]: 
if (i%n_1 == a_1) and (i%n_2 == a_2) and (i%n_3 
== a_3): 
answer. append(i) 
string] = r"<ul><li>$x\equivi%s_\text{_ (mod. 


Sy S/ leis is Camelia) 
string2 = r"<li>$x\equivi%s_\text{_ (mod. 
}%S)$</lis"%(a_2 ,n_2) 
string3 = r"<li>$x\equivi%s_\text{_ (mod. 
3%s)$</li></ul>"%(a_3 ,n_3) 
pretty_print (html ("The simultaneous._solutions._to,,")) 
pretty_print (html (string1+string2+string3) ) 
if lLen(answer)==0: 
pretty_print (html ("are_none")) 
else: 
pretty_print(html("allihave._the_form_")) 
for ans in answer: 
pretty_print (html ("$%s$_modulo,, 
$%s$"% (ans ,n_1*n_2*n_3))) 
except ValueError as e: 
pretty_print (html ("Make_sure_the_modulivare. 
appropriate.for._solving!")) 
pretty_print (html ("Sage_gives.the_error.message:")) 
pretty_print (html (e)) 


Playing with this interact should reveal that sometimes there isn’t any 
solution at all, and that when there are multiple solutions they are actually 
congruent modulo a smaller number! (See Exercise 5.6.24 for the latter.) 

As previously mentioned, Qin discovered a very general method for getting 
answers in this situation. From his method, he seems to have been aware that 
an answer exists as long as gcd(n;,n;) divides a;—a, for all 2 and 7, though he 
did not explicitly state (and certainly did not prove) it. V.-A. Lebésgue* was 
the first to rediscover this latter fact in the modern era, in 1859. (See [E.7.44] 
for a comparative analysis with some Indian and European developments.) 


5.5.2 The case of coefficients 


Another case is that of congruences not of the form « = a (mod n), but of the 
form Az = B (mod n). What can we say when there are coefficients for the 
variable in our linear system? 

If you have simultaneous congruences with coefficients, 


then first write their individual solutions in the form x = a; (mod n;). Then 
you can use the CRT to get a solution of that system, which is also a solution 
of the ‘big’ system. 


4www-history.mcs.st-andrews.ac.uk/Biographies/Lebesgue_Victor. html 
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Example 5.5.1 For instance, try now to solve this system: 
e 2x = 2 (mod 5) 
e 5x2 = 4 (mod 6) 
e 3x = 2 (mod 7) 


Surprised? Don’t forget to get back to the original modulus! 


See also Example 6.5.2 for combining these ideas with those of Proposi- 
tion 5.4.5. 


5.5.3 A practical application 


Finally, there is a practical application. Suppose you are adding two very large 
numbers — too big for your computer! How would you do it? The answer is 
one can use the CRT, in particular the ideas of Proposition 5.4.5. 


e First, pick a few mutually coprime moduli smaller than the biggest you 
can add on your computer. 


e Then, reduce your two numbers xz and y modulo those moduli and add 
the two huge numbers in each of those moduli. 


e Then the CRT allows you to put «+ y modulo each of the moduli back 
together for a complete solution! 


Needless to say, we won’t do an actual example of this. See [E.2.4, Chapter 
3.3] for a basic example and a reference. 


5.6 Exercises 


1. Why do the latter two strategies in Fact 5.2.1 need no additional proof? 


Complete the outline of the proof of Proposition 5.2.7, including “the 
direction when we assume a = b”. 
3. Solve one or both of the congruences in Example 5.2.4. 


In Proposition 5.1.1 and Proposition 5.1.3, we found solutions to az = b 
(mod n) in the form of congruence classes modulo n. But since gcd(a,n) = 
d is so important here, it could be worth asking about congruence classes 
modulo n/d instead. 

Well, for a general congruence az = b (mod n), how many congruence 
classes (mod n/d) do we get? Prove it. (A good approach is to pick a 
specific problem and try it, then see if you get the same answer in general.) 


5. Answer the questions in Question 5.3.6. 


6. Write down two linear congruences modulo n which do not have solutions 
when n = 15, but do have solutions when n = 16. (You do not have 
to solve them, but should explain how you know they do or do not have 
solutions.) 


7. Come up with a counterexample to Proposition 5.2.7 when gcd(d,n) # 1. 
Exercise Group. For each of the following linear congruences, find all of its 


solutions. 
8. 152 =9 (mod 25) 9. 6x =3 (mod 9) 


10. 14x = 42 (mod 50) 11. 152 = 42 (mod 50) 
12. 13x = 42 (mod 50) 13. 9802 = 1540 (mod 1600) 
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14. Solve the simultaneous system below. ({E.2.1, Exercise 3.8}) 
¢ x =1 (mod 4) 
e x =2 (mod 3) 
e x =3 (mod 5) 


15. Solve the simultaneous system below. 
e x =2 (mod 3) 
e «=4 (mod 5) 
e x =6 (mod 13) 


16. Find an integer that leaves a remainder of 9 when it is divided by either 
10 or 11, but that is divisible by 13. 

17. When eggs in a basket are removed two, three, four, five, or six at a time, 
there remain, respectively, one, two, three, four, or five eggs. When they 
are taken out seven at a time, none are left over. Find the smallest number 
of eggs that could have been contained in the basket. (Brahmagupta, 7th 
century AD — and many other variations in other cultures) 


18. Find a problem on the internet about pirates quarreling over treasure (or 
monkeys over bananas) that could be solved using the CRT, and solve it. 


19. Solve the system 4% = 2 (mod 6), 3a = 5 (mod 7), 2a = 4 (mod 11). 
20. Solve the congruence 5a = 22 (mod 84). 


21. Solve the simultaneous system x = 4 (mod 6), x = 7 (mod 15). Note that 
this doesn’t fit our pattern, but you should still be able to solve this, since 
there are only two congruences. (Hint: trial and error.) 


22. Solve Master Sun’s only such problem: x = 2 (mod 3), « = 3 (mod 5), 
x = 2 (mod 7). (This same problem shows up again in Fibonacci’s Liber 
Abaci.) 


23. Solve one of Qin’s problems (adapted from [E.5.10, Chapter 22]). Does it 
seem any more realistic than any ‘word problems’ you did in high school? 
Thieves have stolen rice, measured in ge (today, about 100 milliliters), 
from three identical full containers. The first thief stole all but one ge 
from the first container with a ladle containing 19 ge; the second one left 
fourteen ge after stealing with a shoe which could hold 17 ge; the third 
left only one ge, using a bowl which held 12 ge. How much rice was lost, 

and how much did each thief take? 

24. Solve several systems of three congruences where a; = 1 for all 7 and where 
the moduli n; are not coprime, either using technology or by hand. Com- 
pare the minimum difference between two solutions with the the various n;. 
Conjecture a general connection between them. (Hint: See Exercise 2.5.9, 
or jump forward to Exercise 6.6.18.) 


Summary: Linear Congruences 


In this chapter we begin the process of shifting from solving equations as 
‘sentences for equality’ to solving congruences as ‘sentences for congruence’. We 
start with the simplest context, linear congruences. 


1. In Proposition 5.1.1 and Proposition 5.1.3 we have a full characterization 
of solutions to the basic linear congruence ax = b (mod n). 


2. To use the previous section in situations where a solution exists, we need 
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Strategies that work for simplifying congruences. The cancellation propo- 
sitions 5.2.6 and 5.2.7 are key tools. 


3. It is an ancient question as to how to solve systems of linear congruences, 
and the Chinese Remainder Theorem is the prime tool for this. We also 
introduce The Inverse of a Number in this section. 


4. In the next section we then make this explicit in Algorithm 5.4.1, and 
practice it. In the future the corollary Proposition 5.4.5 will prove very 
useful. 


5. In the last section there are several more advanced topics which we briefly 
mention to inspire readers, but do not pursue — notably, Qin’s solution 
for the situation when we have Moduli which are not coprime. 


There are once again many Exercises, but it is worth mentioning that this is 
a chapter where making up your own congruences (or systems of congruences) 
is a great way to get extra practice. 


Chapter 6 


Prime Time 


Now it’s time to introduce maybe the most important concept in the whole 
course. It’s one you are almost certainly already pretty familiar with. That is 
the concept of prime numbers. 

Although we'll take a somewhat traditional route to introduce them, con- 
sider what precedes this chapter. We attacked linear congruences as far as we 
could via the concept of ‘relatively prime’/‘coprime’. But the thought should 
be gnawing at us of whether there is something deeper than simply not shar- 
ing factors other than one; what are the factors that are (or are not) shared in 
the first place? As mathematicians, we always want to ask whether there is a 
simpler notion available, or one that explains more. 

We will see the fruit of this for linear congruences in Section 6.5, using the 
most powerful tool in our arsenal, Theorem 6.3.2. But once we have unleashed 
the power of primes, we will see and use them everywhere, such as in Chapters 
22 and 12. Examining them more closely will lead to us some of the deepest 
mathematics of the book in Chapters 21 and 25. 

So let’s get started! 


6.1 Introduction to Primes 


6.1.1 Definitions and examples 


Definition 6.1.1 A positive integer p greater than 1 is called prime if the 
only positive divisors of p are 1 and p itself. © 
Definition 6.1.2 If an integer n > 1 is not prime, it is called composite. © 

The first few primes are 2,3,5,7,11,... That means 4,6,8,9,10,12... are 
composite. But figuring out which numbers are prime is notoriously difficult, 
to the point that educational websites sometimes offer tricky games with this 
as the goal — try Is This Prime! if you think you are good at it! Indeed, we 
will spend significant time later on the question of deciding primality, such as 
in Chapter 12 and Chapter 21. So below, we introduce a few Sage functions 
for exploring the primes. 

Here are answers to questions you might have about primes that Sage could 
answer. 


e Is a given number prime? 


listhisprime.com/game/ 
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is_prime(6) # Is my number a prime? 


False 


e Is it at least a power of a prime? 


is_prime_power(25) # Is my number a prime power? 


True 


e List some primes for me! 


PR = prime_range(100) # What are all primes up to but 
not including 100? 
print (PR) 


[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 
47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97] 


e List the first n primes ... 


PFN = primes_first_n(100) # What are the first 100 
primes? 
print (PFN) 


[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 
47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 
103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 
157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 
211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 
269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 
331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 
389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 
449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 
509, 521, 523, 541] 


e Give me prime factors. 


# What are the prime factors of a number? 
factor( 2 * 3 * (2*3+1) * (2*3*(2*3+1)+1) * 
(2*3%*(2*34+1) *(2*3*(2*34+1)4+1)4+1) ) 


2* 3 * 7 * 13 * 43 * 139 


Sage note 6.1.3 Making comments. Sometimes we might want to have 
notes about the code included without being actual code. In the Python lan- 


guage, such comments must come after # signs. 


6.1.2 Prime fun 


Before getting to the serious material, let’s have a little fun to start us thinking 
along the lines of what’s to come. For instance, did you ever try to see if there 


was a formula for primes? 


f(x) =x*2+x+41 
@interact 
def _(n=(0,[0..39])): 
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pretty_print (html ("Is $%s$_for_$x=%s$, .which is $%s$, a. 
prime. number ?"%(f(x),n,f(n)))) 
print (is_prime(f(n))) 


It looks like a simple polynomial can get the primes for us! Of course, I’m 
cheating a little, as the next two sets of commands show. 


f(x) =x*2+x+41 
f (40) 


1681 


is_prime(f(40)), factor (1681) 


(False, 41%2) 


This example is due to Euler?. For this form of polynomial it is the best 
known®, but you may have thought (based on the scanty evidence of this one 
example) that one could eventually find a polynomial which just gives primes. 
Quite the opposite is true! 


Fact 6.1.4 There is no non-constant polynomial f(x) with integer coefficients 
such that f(a) is prime for all integers x. 

Proof. What is the reason no such polynomial can exist? It turns out to 
be directly related to our previous work on congruences. Namely, if f(a) = p 
for some a, then suppose b = a (mod p). By well-definedness of addition and 
subtraction, we then have f(b) = f(a) (mod p) as well (since f is a polynomiall), 
so 

f(b) = f(a) = p =0 (mod p), which implies p | f(b). 


Since we assume f(b) is actually prime, then f(b) = p as well. 
But then the problem arises that 


f(a) = f(a+np) =p for all n € Z, 


which contradicts the well-known calculus fact that all non-constant polynomi- 
als have lim; ,.. f(x) = co or — oo. So f must be constant. a 

It might be a big surprise to some readers to see that limits and calculus 
can be used in number theory! It is nice to see it at such an early stage, but 
there will be more later, such as in Chapters 24 and 20. 

There are other single-variable polynomials that do happen to generate a 
number of primes; an impressive one follows. Among other sites, Mathworld* 
has lots and lots more information. 


g(x) =8*x*2-488*x+7243 
for n in [0..30]: 
print(g(n),is_prime(g(n))) 


7243 True 
6763 True 
6299 True 
5851 True 


43 True 


2See [E.4.26, Chapter 11] or [E.4.8, Section 1.8] for a connection to Remark 13.3.4. 
3See references in the previous footnote. 
4mathworld.wolfram. com/Prime-GeneratingPolynomial.html 
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-37 False 

-101 False 
-149 False 
-181 False 
-197 False 


One can ask the opposite question of finding functions which do not make 
many primes. The same website mentions the following polynomial, which 
takes an astounding long time to generate even two primes. 


h(x)=x*6+1091 
for n in [0..3906]: 
if is_prime(h(n)): 
print((n,h(n))) 


(@, 1091) 
(3906, 3551349655007944406147) 

Finally, it is an important (and, to me, somewhat frightening) fact that 
Fact 6.1.4 is not true for systems of multivariate polynomials; that is, some 
such systems have only prime output for integer input. See e.g. Wikipedia® for 
the astounding details, including a polynomial inequality that generates only 
primes. 


6.2 To Infinity and Beyond 


6.2.1 Infinite primes 


At this point it’s a good idea to mention that the search for 100, or 1000, or 
however many prime numbers is not hopeless! That is the content of Euclid’s 
famous theorem on the infinitude of the primes (Elements Proposition IX.20°). 

Strictly speaking, he proves that no matter what n is, there is always a 
bigger prime p > n. This is not the same as proving there is an actual “infinitely 
large set of primes” in the sense of Cantor’’s infinite cardinalities! But we still 
say there are infinitely many prime numbers. 

As usual, Joyce’s web version of the original® is a great resource. There are 
many proofs? of this theorem, some of which would be corollaries of theorems 
later in this text. Most use some form of proof by contradiction, but there are 
exceptions, such as Saidak’s proof!? [E.7.22], which we will mention again in 
Section 21.1 (see also Exercise 21.5.3). One notable proof by Furstenberg?! uses 
point-set topology, though this has been interpreted in a non-topological way!” 
as well. There is even a proof using regular languages/expressions ([E.7.39]) 
suitable for use in an upper-level computer science course on computational 
models. 

Here is a slightly modernized version of Euclid’s proof. 


5en.wikipedia. org/wiki/Formula_for_primes#Formula_based_on_a_system_of_ 
Diophantine_equations 

Swww.cLlaymath. org/euclid/index/book-9-proposition-20 

7www-history.mcs.st-andrews.ac.uk/Biographies/Cantor.html 

8aleph®.clarku.edu/~djoyce/java/elements/bookIX/prop1X20. html 

9So many that it is hopeless to keep up with all of them. I have a stack of recent proofs 
in my office I originally intended to occasionally add to the references, but it quickly grew 
out of proportion to other topics in the book! 

10t5k.org/notes/proofs/infinite/Saidak.html 

llen. wikipedia. org/wiki/Furstenberg's_proof_of_the_infinitude_of_primes 

12www. idmercer.com/monthly355-356-mercer.pdf 
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Theorem 6.2.1 Infinitude of Primes. There is no upper bound on the size 
of the collection of prime numbers. 
Proof. Suppose that we have found exactly n > 0 prime numbers, pj, p2,...,Dn- 
Find the smallest positive integer N which is a multiple of all of these simul- 
taneously (we know at least one such number exists, since you could multiply 
them all together). 
Then either N +1 is prime, or it is not!*. If N +1 is prime, then it is certainly 
different from the others, so we have increased the size of the set of primes. 
If on the other hand N + 1 is not prime, then it has some nontrivial factor; 
in fact, it has a prime divisor p. (This distinction does actually require proof, 
and is Euclid’s Book 7, Proposition 31, but we will let it follow immediately 
from Theorem 6.3.2 instead.) We claim p is not one of the p; already known. 
If it were, then if p is a divisor of both N and N +1, which means it is a divisor 
of 1 (see Exercise 2.5.7). This is absurd (dtozoyr, literally ‘out of place’). Can 
you recall why? 
So p is not one of the original list, and is prime, so we have found a larger list 
than before. a 
There are two things worth pointing out about this proof. First, Joyce 
points out that Euclid doesn’t bother to mention that N is in fact the prod- 
uct of the primes in question. If one didn’t have the concept of primality, 
and instead started with a set of mutually coprime positive integers (recall 
Definition 2.4.9) then an analogous proof would show the (weaker but still 
interesting) result that there is no upper bound on the size of such a set. 
Secondly, as is typical, Euclid only proves this with a small n, rather than 
with some modern stand-in for infinity like ellipses. See Figure 6.2.2'4. Those 
interested in math history will be interested in how Wallis used this to his 
advantage in the Hobbes-Wallis controversy). 


Figure 6.2.2 Part of proof of Euclid I[X.20 (Image courtesy of the Clay Math- 
ematics Institute. No commercial use allowed.) 


6.2.2 The sieve of Eratosthenes 


Much later in the text we will talk some about efficient ways to tell if a number 
is prime, or even to generate new prime numbers (see Chapter 12, for example). 


13Buclid used line segments to indicate magnitudes, including integer ones like what we 

call N + 1. So this claim looks like 6 67 EZ fro nedtdc éoww 7 od in the original, where EZ 

is the line segment. 
14See the Clay Math Institute website (www.claymath.org/Library/historical/euclid/) for 

more images from this manuscript, which is well over a millennium old, held at the Bodleian 

library at Oxford. 
15www.maa.org/publications/maa-reviews/squaring-the-circle-the-war-between-hobbes-and-walLlis 
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For now, we will use something usually known as the Sieve of Eratosthenes. 


Algorithm 6.2.3 Sieve of Eratosthenes. To check whether a number n > 1 
is composite or prime, it suffices to divide by all primes p < ./n. Anything that 
isn't divisible by these is prime. 

Proof. If n is not prime (composite), we can write n = de for integers d and e 
both strictly between 1 and n. If both d,e > ./n, then 


n=de> (Vn)? =n, 


a contradiction. a 
This is indeed an algorithm, because it provides a specific procedure to 
identify primes up to a specific limit. 


Example 6.2.4 To get all prime numbers up through 100, it suffices to remove 
any numbers divisible by 2,3,5, or 7, as V100 < 11. 


Historical remark 6.2.5 Eratosthenes. Eratosthenes was a contemporary 
of Archimedes, and no slouch. He is best known for estimating the size of the 
Earth fairly accurately, amazingly so for the time. (Along the way, that puts 
the lie to those who would claim everyone thought the earth was flat until 
Columbus.) 

Finding tighter results, like the smallest prime above a certain number, 
requires more advanced techniques like the ones in Section 12.2. An interest- 
ing, but completely impractical, fact (see [E.7.34]) is that the smallest prime 


exceeding n is the smallest nontrivial divisor of ne — 1, 


6.3 The Fundamental Theorem of Arithmetic 


6.3.1 Preliminaries and statement 


Our biggest goal for this chapter, and the motive for introducing primes at this 
point, is the Fundamental Theorem of Arithmetic, or FTA. It should probably 
be called the Fundamental Theorem of Number Theory, but in older usage one 
said “arithmetic”, and the name has stuck. 


Definition 6.3.1 A factorization of an integer is a way of writing it as a 
product of integers. This nearly always refers to one of two things, which are 
mentioned explicitly if there is danger of ambiguity: 


e A product of prime numbers is called a prime factorization. 


e A product into positive powers of (distinct) primes is called a prime 
power factorization. 


% 


Theorem 6.3.2. Fundamental Theorem of Arithmetic. The following 
are true: 


e Every integer N > 1 has a prime factorization. 


e Every such factorization of a given n is the same if you put the prime 
factors in nondecreasing order (uniqueness). 


More formally, we can say the following. Any positive integer N > 1 may 
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be written as a product 
n 
N= [I> 
i=1 


of primes, and further, if we can write a different such product 
m 
= II qj 
j=l 


then m =n and a reordering of the q; will make them the same as the p;. 
Proof. We will prove this in Subsection 6.3.2. a 


Example 6.3.3 For instance: 
e 30=2-3-5 
e 24=2-3-2-2=2-.2-2-3 


Clearly (from normal experience) the only possible factorizations than these 
would just put the primes in a different order. Why doesn’t this work for 
N = 1? (See Exercise 6.6.24.) As it happens, Euclid did not even consider one 
to be a number in the same sense as the others; see Joyce’s commentary!® 


Example 6.3.4 Usually we will implicitly assume the primes are in nonde- 
creasing order, and write 3? instead of 3-3 (with the primes now necessarily 
in increasing order), so the following notation is common to express a prime 


power factorization: 
n 
N= Il D;'. 
i=l 


Sometimes when the context is clear, one can even write N = [|p or N =[[p*. 
Using the same numbers as in the previous example: 


¢ 30=2!.31.5! 
« 24=2°.3! 


Example 6.3.5 Just to get this down, practice writing the following as a 
product of such prime powers. 


e N =12100 
e N=1250 
e N =83072 


See Exercise 6.6.14. 


6.3.2 Proof of the FTA 


This theorem is quite old, and of course Euclid has a nice proof of it!’, along 
with various lemmata (the plural of leommat®, though I’ll also use “lemmas” in 
this text) that he needs to get there. The key ingredients are: 


e Ifa number is prime, that zs the prime factorization. 


16mathcs.clarku.edu/~djoyce/java/elements/bookVII/defVII1.html 
17alephO.clarku. edu/~djoyce/java/elements/bookIX/prop1IX14. html 
18 www. duden. de/rechtschreibung/Lemma 
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e Ifa number is composite, then it is by definition divisible by some smaller 
positive number other than one. (Euclid used the stronger fact that it is 
divisible by a prime in his proof of Infinitude of Primes.) 


e This process can be continued, but only finitely often. 


e Any other way in which you can write the same number as a product of 
primes is just a reordering of the one obtained in the previous step. 


The last step requires the following lemma, which is Euclid’s Book 7, Propo- 
sition 301%. 
Lemma 6.3.6 If a prime p divides a product ab, then p divides at least one of 
a or b. 
Proof. Left to reader in Exercise 6.6.3; this is very closely related to Proposi- 
tion 2.4.10. a 


Corollary 6.3.7 If a prime p divides a finite product of numbers, then p divides 
at least one of them, i.e. 


£ 
p II a, implies p | ax, for at least one k 
k=1 
Proof. By induction, left to reader in Exercise 6.6.4. a 
Okay, now we need the details. 

Proof of Theorem 6.3.2. Let’s use induction on the size of N. So our base case 
is N = 2, which is of course prime so it has (the) unique factorization 21. 

For the induction step, first suppose we have proved that all numbers up to 
N can be written as a product of primes (uniquely or not). Then we look at 
N +1 to continue the induction. 


e If N+1 is prime, that is its prime factorization, as with 2. 


e If not, then by definition N + 1 is composite, so N + 1 = ab, where 
1<a,b< N+1. (Note why a,b are smaller! Recall the proof of the 
Sieve 6.2.3.) In this case, by the induction hypothesis, a and b have prime 
decompositions [| p; and [] q;, since they are less than N + 1 but not 1, 
and so N+1=[][pi[]q. 


By induction, this shows that a prime factorization exists for all numbers up 
to N +1. It remains to be shown that such a factorization is unique. 
So first rewrite our factorization in a given order (such as nondecreasing): 


N+1=|[pi, i < pigs. 


Now let’s look at another possible representation, possibly with different 
primes: 


At this point we need Corollary 6.3.7. By assumption, p; divides N +1. Hence, 
by the corollary, p; divides at least one of the q;. But the only positive divisors 
of a prime are itself and 1, and p; is prime (not one), so pi = qj. 

Cancel these from both products to get two different representations of (the 
integer) N*1 as a product of primes. By the induction hypothesis, since this 
number is less than N + 1, these representations are unique up to reordering, 
so multiplying both by p; to get N +1 must also be unique up to reordering. 
By induction, we are done. a 


19alephO. clarku. edu/~djoyce/java/elements/bookVII/propVI1I30.html 
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Two comments about this proof are in order. First, some students may 
wonder about this induction proof, because in the induction hypothesis we do 
not simply assume a single N has a (unique) factorization (as in Example 1.2.4), 
but that all n < N do. But this is just an artifact?° of the statement we are 
proving. The logical induction statement is not that “N has a factorization”, 
but that “all numbers less than N have a factorization”. 

Second, if you are familiar with other algebraic structures, it is very impor- 
tant to note that other algebraic systems may not have unique factorization 
into primes, or even have a notion of prime elements! Even some structures 
very similar to the integers fail this; many interesting examples of this are just 
beyond the level of this course. For those who must know what this means 
now, try Exercise 6.6.30. 


6.4 First consequences of the FTA 


The impact of the FTA is so great, I cannot overstate its significance. This 
section collates a few examples, but you will see similar ones throughout the 
text, as well as in the next section, when we connect the theorem back to 
congruences. 

Most importantly, lots of theorems now have reasons, not just proofs. This 
distinction is an important point about mathematics! The difference boils 
down to the fact that gcd(a,b) = 1 can be interpreted as saying a and b do 
not share any common prime factors. You will (re)prove a few things in the 
Exercises 6.6 to try this insight out. Here is a first example to give the feel. 


Example 6.4.1 If a|c, b| c, and gcd(a,b) = 1, then a = [[ p; and b = [J q; 
but none of the p; can be any of the q; (or the gcd would include that prime). 
Since by the FTA c = [[r;*, where the rz, are distinct, the pj must be some 
of the collection of rzs and the g; must be some of the rest, so that [[ piq; still 
divides c. 
So if a |c, b | c, and gcd(a,b) = 1, then ab | c, which is part of Proposi- 
tion 2.4.10. 
As another example, the proofs from Section 3.7 become far simpler. We 
can prove Proposition 3.7.1 here, and save Proposition 3.7.2 for Exercise 6.6.12. 


Example 6.4.2 Let’s show that a? | z? implies a | z. 
Solution. To begin, let’s write a = [| p*. Then 


[DT = Te =I" 
oS I[7 implies 2? = [7 


If these two numbers divide each other, then we can separate the product by 
each prime, so that for each p, 


Similarly, 


py’ |g’ 
for some q; in fact we must have q = p for each such case?!. But then p?f = 
p?*p?f—2e) and this can be viewed as 2e < 2f, so e < f as well. 
This is true for all the primes p dividing a, so p® | p/ = q/ for all such p; 
multiplying these together shows that 


e=T]>"| TI» | Te =: 


20In my view, there is no pedagogical need for a separate notion of ‘strong induction’. 
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as desired. 
The reader should note that for such proofs, the implicit use of Corol- 
lary 6.3.7 is crucial along with the FTA. 
Nearly as important, computing many kinds of things becomes easier. If 
we let a = []['_, p@ and b =[["”_, pl, where it’s possible that e; or fi; is zero 
at times, then we can often get formulas for various combinations of a and b. 


Definition 6.4.3 Given two numbers x < y, we let the maximum and min- 
imum be defined by 


max(z,y) =y and min(z,y) = 


with an obvious extension to a min or max of a set consisting of more than two 
numbers. © 
Then we have formulas of the following kind. 


Example 6.4.4 Product formula: 


n 
i=l 


Greatest common divisor formula: 
nm . 
gcd(a, b) = Lorn? 
i=1 


Determining a quotient formula, assuming b | a, is Exercise 6.6.8: 


n 


a/b =|] p% 


i=l 


Another use of the FTA is to help us do in a systematic way results that 
were probably first obtained by extremely ad-hoc methods. As an example, it 
is likely that you have seen a proof that \/2 is irrational, and it probably used 
mostly the concept of “evenness”. But we can prove that ,/m ¢ Q (for m not 
an integer perfect square) in a very similar fashion. 

Most deeply, it gives us a canonical way to describe every integer in terms 
of simpler integers, and gives a measure of simplicity. We’ll exploit this later 
in the course, especially in Chapter 24. 

Next are some ways to calculate these concepts in Sage. Simply replace the 
numbers below with ones you are interested in. 


prime_divisors (693) 


factor (693) 


3“2 * 7 * 11 


Note that the first of these functions gives just a list of the prime divisors, 
while the second one gives the full prime power factorization. 

Finally, let’s note that depending on the context, we might not need the 
full power of the computational and theoretical tools in this section. To demon- 
strate that, let’s introduce some useful additional notation. 


21To be pedantic, the set of prime factors q of z? contains the set of prime factors p of a?. 
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Definition 6.4.5 For p prime, we say that p* || n precisely when p* | n but 
p**! does not divide n. oy 


Definition 6.4.6 We write n! for the product of the integers from 1 to n, 
called n factorial. © 


Example 6.4.7 We can demonstrate these by saying 5? || 75 and 6! = 720. 


The prime factorization of a number can now separately give useful infor- 
mation about it. 


Example 6.4.8 How many final zeros does twenty factorial have? 


Solution. Either by hand or with help, we can see what the biggest powers 
of 2 and 5 in 20! are. 


factor(factorial (20) ) 


2°18 * 3%8 * 5*%4 * 7*2 * 11 * 13 * 17 * 19 


Since 21° || 20! and 5* || 20!, we can conclude that 20! ends with exactly 4 
zeros merely from the prime factorization, which we could certainly get without 
multiplying it out (though in this case Sage does that first). 

We can check this result: 


factorial (20) 


2432902008176640000 


6.5 Applications to Congruences 


6.5.1 Factoring the modulus 


The reason the fundamental theorem is so useful for congruences is that prime 
powers (for different primes) are automatically relatively prime to each other. 
So in using the Chinese Remainder Theorem (Theorem 5.3.2) we don’t have a 
spend time looking for coprime factors; we can just factor into prime powers 
using the Fundamental Theorem of Arithmetic. So here is a useful repositioning 
of Proposition 5.4.5. 


Proposition 6.5.1 Converting to and from prime powers. Suppose that 
X =Y (mod N), and N = [[p;'. Then we have an equivalence between this 
congruence and a related system of congruences. 


e Certainly if N divides X —Y, so does every factor of N, so X =Y (mod 
p;') for each of the prime power factors of N. (Once again, solutions to 
the “big” congruence are also solutions to a system of many little ones.) 


¢ Conversely, the prime powers in a factorization are all coprime to each 
other, so if we are given a solution pair (X;, Y;) to each of the congruences 


X; = Y; (mod p§*) 
then when combined they will give a solution of 


X=Y (mod N). 
That means that any question about congruences is really a question about 
systems of congruences modulo prime powers. We will use this fact again and 
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again in the remainder of the text, and it is a huge reason why the CRT is so 
intensely powerful. 

Similarly, referring to Subsection 5.5.2, what if one has one complicated 
congruence with coefficients and a composite modulus N? 


Az = B (mod N) 


Just take N = p{!---p;* and then solve all the congruences Ax = B (mod p*') 
first. Then use the Chinese Remainder Theorem to ‘patch’ them together for 
a final solution. This is a little tedious, but certainly doable. 


Example 6.5.2 Let’s solve the following congruence using the method in the 
previous paragraph: 
21x = 33 (mod 180). 


Here are some steps: 
e Create the individual congruences 
e Solve them 


e Put them back together 


6.5.2 Moduli that are prime powers 


When it comes to linear congruences, these consequences of the Chinese Re- 
mainder Theorem and Fundamental Theorem of Arithmetic suggest that we 
reconsider the prime power case with a more subtle tool. Assume that in 
solving a bunch of congruences 


x =a; (mod n;) 
we would like to start by solving congruences 
x =a, (mod p*) 


where p® divides n;. 

The general approach, then, is to first solve modulo p, in the hope that this 
could lead to a solution modulo p*. Consider the following extended example, 
divided into two parts. 


Example 6.5.3 Prime Power Congruences. One reason we might want 
to solve such a congruence is for finding an inverse (recall Definition 5.3.4) for 
various purposes, so suppose we want to find the inverse of 4 modulo 49 = 7?. 
That is solving 4% = 1 (mod 49). 

First, let f(a) = 4x2 —1. The only solution of 4% = 1 (mod 7) is clear; it is 
x = [2]. How might we get solutions (mod 49) from this? We delineate relevant 
steps. 


¢ First, any solution of 42 = 1 (mod 7?) is also a solution of 4% = 1 (mod 
7). So a =2+47k (mod 49) for some k, since [2] = {2+ 7k | k € Z}. 


e Plugging 2+ 7k in the original congruence yields 
dg = 4(2 + 7k) = 4-2+4+4-7k =1 (mod 49), 
or, rearranging (but keeping everything unmultiplied), 


1—4-2=4-7k (mod 7’). 
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¢ Now, we know that 7 | 1— 4-2, because we already know that 2 solved 


And indeed 4-37 = 148 = 1 (mod 49). 


Example 6.5.4 Let’s do it all again, more tersely, to get an inverse modulo 
.e. a solution to 4¢ = 1 modulo 7? = 343. 


Pi 


And 


You can do this as often as you like, and (properly interpreted) it will yield 


our original congruence: 
1 =4-2 (mod 7). 


So we can cancel out 7 from the entire congruence (as in Proposition 5.2.6) 


to get that 
1-4-2 


7 
This simplifies to —1 = 4k (mod 7). 


= 4k (mod 7). 


By inspection —1 = 4k has the solution k = 5 (mod 7). Using this k and 
plugging it back in to get a solution to 4z = 1 (mod 77), we get 


24+ 7k =24+7-5=37 (mod 77) 


as the solution. 


I already know that [37] is the solution to 42 = 1 (mod 7”). That means 
that a solution to 4z = 1 (mod 7%) must look like 37 + 72 (for some 
integer £). 


Plugging this in gives me 4(37 + 772) = 1 (mod 7%), which rearranges to 


4-7°£=1-— 4-37 (mod 7°). 


Since we know that 37 solves 4x = 1 (mod 7”), that means (by definition 
of congruence) that 
7? |1—4-37, 


so we can divide “all three sides” of the last congruence by 77, which 


yields 
1—4-37  —147 
4e= a = = 3 = 4 (mod 7). 


Solving this yields = 1 (mod 7), so 


x = 37+7?-1= 86 (mod 343). 


a quick check shows 4 - 86 = 344 = 1 (mod 343) works. 


all solutions of your congruence modulo p*, one step at a time. We'll see a 
generalization of this in Section 7.2. 


6.6 Exercises 


1. 


A number such as 11, 111, 1111 is called a repunit. Clearly eleven is a 
prime repunit. Find two more, say how you found them, and how you 
confirmed they are prime. (Bonus: Do the same exercise in a base other 
than decimal — or unary or binary!) 
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2. 


Find the prime numbers less than 100 using the Sieve of Eratosthenes 
(6.2.3). Make sure you actually draw it! Every math student should do 
this once, and only once. 

Prove Lemma 6.3.6; if a prime p divides a product ab, then p divides at 
least one of a or b. 

Prove Corollary 6.3.7; if a prime p divides any finite product of numbers, 
then p divides at least one of them. 

Assuming that gcd(a,b) = 1, a| c, and b|c, then ab | c as well, using the 
FTA. (This was proved earlier without it; see Proposition 2.4.10.) 

Prove that if gcd(a,b) = 1 and a | bc then a | c as well, using the 
FTA. (This was proved earlier without it; see Exercise 2.5.19 and Propo- 
sition 2.4.10.) 


Prove using the FTA that if gced(a,b) = d then ged ($, 4) = 1. 


Assuming a = |];_, ps, b = Tj, pl, and b | a, find a formula to fill 
in the questions marks and prove it using the Fundamental Theorem of 
Arithmetic: 


n 


a/b =| v;” 


i=1 
How would you describe a factorization of a rational number? Do you 
think you could extend the Fundamental Theorem of Arithmetic to this 
case? If so, how? If not, why would it not be appropriate? 


Show that if a and b are positive integers and a? | b?, then a | b. 


Show that if p* || m and p? || n, then p**? || mn. 


. Prove Proposition 3.7.2 using the FTA; if gcd(m,n) = 1 and mn is a 


perfect square, then so are m and n. 


. By hand, find the prime factorizations of 36, 756, and 1001. Use these to 


find the gcd of each pair of these three numbers. 
Do the prime factorizations in Example 6.3.5. 


. By hand, find the ged of 2? . 35-7? - 13-37 and 2° -3*- 11-317. 
. By any method you like, find the prime factorizations of 274—1 and 10°—1, 


as well as their gcd. 


Exercise Group. In the next few exercises, recall the definition of least 
common multiple (or lem) from Exercise 2.5.9. 


17. Find the pairwise least common multiples in Exercises 6.6.13-6.6.15. 


18. Find a formula for the lem using Theorem 6.3.2 by filling in the ques- 
tion marks: 


lem(a,) = T] 2?” 
t=1 


19. Prove that if a,b > 0 then gcd(a, b)lcm(a, b) = ab using the FTA. 


Exercise Group. Here are a few other interesting results that can be shown 
using prime factorizations as in Section 6.4. 


20. Is it possible for n! to end in exactly five zeros? 


21. Find a proof that \/2 is irrational, and show exactly where it uses 
the Fundamental Theorem of Arithmetic (or how it avoids using it). 
Explain whether or not a similar proof to the one you found would 
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work for showing V3 and V6 are irrational. 
22. Show that log, (5) is irrational. 


23. Show that 3?/° is irrational. 

24. How would Theorem 6.3.2 fail if we allowed n = 1 to have a prime factor- 
ization? What if we allowed 1 as a prime number? 

25. Prove that the only solutions of x? = x (mod p) are x = [0] and z = [I], if 
pis a prime. (Refer to Question 4.6.7; this and the next exercise answer 
Exercise 4.7.19.) 

26. Try to decide for exactly which composite moduli n the previous question 
is true. (Refer to the interact in Question 4.6.7; this and the previous 
exercise answer Exercise 4.7.19.) 

27. Find solutions to 3x — 4 = 0 (mod 25) and (mod 125) using the method 
in Subsection 6.5.2, starting with modulus five. 

28. Find solutions to 4x —1 = 0 (mod 121) and (mod 1331) using the method 
in Subsection 6.5.2, starting with modulus eleven. 

29. Fill in the details of Example 6.5.2. 

30. Let Z[/—5] be the set of all numbers of the form a + b\/—5 for a,b € Z. 
Find two factorizations of N = 6 in this set (known as a ring), for which 
none of the factors are +1, nor for which any two factors differ by a 
(multiplicative) factor of £1. 


Summary: Prime Time 


We can’t wait any longer! In this chapter we talk all about prime numbers. 


1. First, we define prime and composite numbers in Definition 6.1.1 and 
Definition 6.1.2. There is a lot of Prime fun to be had trying to find 
formulas for primes, or using Sage to compute. 


2. The foundational result enabling the rest of our usage of primes is Euclid’s 
proof of Infinitude of Primes, and the Sieve of Eratosthenes is a practical 
way to use this knowledge. 


3. We define prime factorization in Definition 6.3.1. Then the great theo- 
rem saying this is both always possible and unique is the Fundamental 
Theorem of Arithmetic. Some of the details of its proof are important 
on their own, such as Corollary 6.3.7. 


4. The following section gives many formulas that come directly as First 
consequences of the FTA. 


5. Finally, we make explicit the procedure for Converting to and from prime 
powers in solving congruences, along with several interesting examples 
such as Example 6.5.3 and Example 6.5.4. 


In the Exercises, the ones that practice the conceptual basis of the Fundamental 
Theorem of Arithmetic are the best. 
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Chapter 7 


First Steps With General Con- 
gruences 


One can say a lot more about solving congruences. However, congruences also 
play a crucial role in solving all manner of other number-theoretic problems. 

In this chapter we collate a significant number of interesting results that the 
congruence framework affords us. Among them are some of the most important 
results we have access to at this early stage, including Fermat’s Little Theorem 
and Lagrange’s Theorem on polynomials. 


7.1 Exploring Patterns in Square Roots 


Just as in high school algebra one moved from linear functions to quadratics 
(and found there was a lot to say about them!), this is the next natural step in 
number theory. We will focus on congruences. We haven’t abandoned integers! 
But it turns out that questions about quadratic polynomials with integers are 
much, much harder, and are better pursued after studying the relatively simple 
(and computable) cases of quadratic congruences. Much later, we will return 
to a full investigation of this. 

You may recall that we looked at one particular quadratic congruence in 
Question 4.6.7 and Exercise 4.7.19, and saw that the solution depended at 
least partly on the modulus in Exercises 6.6.25 and 6.6.26. So we will examine 
these slightly simpler-sounding questions keeping in mind the structure of the 
modulus, not so much the actual answers. 


Question 7.1.1 Consider the following questions, even if the term ‘square 
root’ seems a bit odd right now. 


e For what prime p does —1 have a square root? 


e For what integers n does 1 have more square roots than just +1? 


As we will precisely define in Definition 13.3.1, these questions are equiva- 
lent to the following quadratic congruence questions. 


e Is there a solution to 


zx? = —1 (mod p) or z? +1=0 (mod p)? 
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e Are there more than the two obvious solutions to 


az? = 1 (mod n) (or equivalently 2? — 1 = 0 (mod n))? 


Let’s look at each of these in turn. If you are online, you may use the 
following interacts, but they are merely an aid. It is quite possible to use 
pencil and paper to explore these as well. 


e An interact for which primes —1 has a square root: 


@interact 
def _(p=(13, prime_range(10,100))): 
pretty_print (html ("Values of $x*2+1$_mod.%s"%(p,))) 
pretty_print (html("<ul>")) 
for m in [Q@..p-1]: 
pretty_print (html (r"<li>$%s*2+1\ equiv %s\text{ui 
(mod. }%s)$</li>"%(m, mod(m,p)*2+1,p))) 
pretty_print (html("</ul>")) 


e An interact for when 1 has more square roots than just +1 — a rather 
tricky question: 


@interact 
def _(n=(12,[10..100])): 
pretty_print (html ("Values _ofi$x*2-1$.mod.%s"%(n,))) 
pretty_print(html("<ul>")) 
for m in [Q..n]: 
pretty_print (html (r"<li>$%s*2-1\equivi%s\text{u 
(mod. }%s)$</lLi>"%(m, mod(m,n)*2-1,n))) 
pretty_print (html("</ul>")) 


What do you get? See Exercise 7.7.1. To keep track of results, writing ideas 
in the margin of a physical book or in a small text document on a computer 
are both awesome. 


7.2 From Linear to General 


In this section, we will take two ideas we already used with linear congruences, 
and see how they can be modified to apply in any polynomial situation. (Note 
that, as in Fact 6.1.4, we only consider polynomials with integer coefficients.) 


7.2.1 Combining solutions 


One of the most important things we can do is study congruences with prime 
(power) modulus, because we can combine their solutions to get solutions for 
any congruences when we combine the Chinese Remainder Theorem and Fun- 
damental Theorem of Arithmetic (recall Proposition 6.5.1). Even more inter- 
estingly, we can combine the numbers of solutions. 

Informally, if you want to get the total number of solutions of a polynomial 
congruence, just write the modulus as a product of prime powers n = ls Ds, 
find out how many solutions the congruence has with each prime power mod- 
ulus, then multiply those numbers for the total number of solutions. 
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Example 7.2.1 For instance, if f(a) = 0 has 2 solutions modulo 3, 1 solution 
modulo 5, and 3 solutions modulo 7, it would have 2-1-3 = 6 solutions modulo 
105 =3-5-7. 

We will state this for the general case of a coprime factorization of n, though 
again the prime power factorization is usually the most useful. 


Fact 7.2.2 Let nj,n2,:-:,n~ be a set of k mutually coprime moduli. Suppose 
that for some polynomial f(x) you know that there are N; (congruence classes 
of) solutions to 

f(x) =0 (mod nj). 


Then the congruence 
k 
N, total solutions. 


k 
f(z) =0 (vn I» has 
i=1 i=1 


Proof. For all i, among the N; solutions to the 7th congruence choose a solution 
a;, so that 
f(ai:) = 0 (mod nj). 


Since the moduli n,; for these congruences are coprime, we can use the Chinese 
Remainder Theorem to obtain one number a such that a = a; (mod n,) for all 
a. 

Since (integer) polynomials are exclusively made up of addition and multipli- 
cation on integers, and addition and multiplication are well-defined, we also 
have f(a) = f(a;) = 0 (mod n;), so as promised we have a solution 


k 
f(a) =0 (na I») : 


Each such set of a; will yield a solution, and if {a;}*_, 4 {b;}*_, then if a; 4 b; 
(mod n,;) they certainly are not equivalent modulo jee n, either. 

Now multiply how many solutions there are for each n; to get the total number 
of combinations of solutions. If there are N; solutions modulo n;, we would get 
if eae N;. There aren’t any additional answers, because any answer to the ‘big’ 
congruence automatically also satisfies the ‘little’ ones; if IL n; | f(a), then 
certainly n; | f(a) as well. 


7.2.2 Prime power congruences 


We have already discussed prime power congruences in Subsection 6.5.2. Re- 
call that in Examples 6.5.3 and 6.5.4 we took the (obvious) solution of 4a = 
1 (mod 7) (namely, x = [2]), and got solutions (mod 49) and even (mod 343) 
from it relatively easily. 

But that is essentially the same as asking for solutions to 4a -—-1=0,a 
linear congruence. Let’s see if we can generalize this method for more general 
polynomial congruences. 

The key was taking the already known fact 7 | 1—4-2 and then cancelling 
out 7 from the entire congruence to get that 


1-4-2 
7 


We were able to solve the resulting congruence —1 = 4k (mod 7), which had 
solution & = 5 (mod 7). Finally, we plugged that back in to get a solution to 
4x = 1 (mod 77), which was 


247k =24+ 7-5 =37 (mod 7”) 


= 4k (mod 7). 
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as the solution. 

Can we use this approach to get solutions to more advanced congruences 
as well, like the simple quadratics we’ve started exploring in this chapter? The 
answer is yes, with a minor caveat. The preceding discussion was just a basic 
form of the following. 


Theorem 7.2.3  Hensel’s Lemma. For p prime and e > 2, suppose you 
already know a solution equivalence class te-1 (mod p*—') of the (polynomial) 
congruence 


f(x) = 0 (mod p*~*) 


Assume the technical condition that gcd(p, f’(ae_1)) =1. Then there is also a 
solution to 


(2) =0 (mod p*) 
of the form (and unique of this form) 


Le = XLe-1 + hoe 


where k satisfies 


wae +k- f'(te-1) =0 (mod p). 
Proof. If p and f’(xe_1) are relatively prime, then by Proposition 5.1.1 any 
linear congruence of the form f’(%e_1)k = 6 (mod p) with coefficient a = 
f’(@e-1), unknown k, and known b can be solved (uniquely modulo p, given 
the gcd condition). Since x,_, is a known zero of f(x) for modulus p*~', we 
know that as an integer (not modulo anything) p®! | f(xe_1). 

f(®e~1) 


This means that ner ae exists in Z, so if we set b = ee there will 


indeed be a solution & to the congruence fits +k- f'(ae-1) = 0 (mod p). 
Then the only question becomes why x = @e_1 + kp®~! is actually a solution 
to f(a) =0 (mod p*). 

To see this, think of f as a polynomial with terms of the form ¢2", e.g. f(x) = 
eg iz’. Then f(ae—1 + kp*—') can be expanded out term-by-term in the 
following form: 


F (xe) = Pte ar kp*") = aes Tr ae 


Let’s break this down on a term-by-term basis in the sum. Each term will look 
like 
e-1 y 


eat) (e-1)2° 


Ci(te—-1 + kp = Cyr 4 + ele.) -kp -i+ terms with at least p 


Since e > 2 in this context, the extra terms (from Taylor or binomial series*) 
involving p°—!)? will be divisible by at least p® and hence be trivial in that 
modulus. Thus, each term in the sum will be equivalent to 


ett_, +c - iat} - kp®} (mod p*). 


Now add up the terms of the sum for all i to find out something about f(x). 
Summing up the c;x‘_, will give us f(2-_1), while summing up ia!—} is adding 
terms that look like the derivative of polynomials, so the sum of c; int} -kpe} 
yields f’(ae_1) - kp®-+. Summarizing this paragraph, 


jhe) = (Ce) if G4) . hp (mod p*). 


By hypothesis p°~! | f(xe_1), and obviously p®~+ | f’(ae_1) - kp®~! and p°; 
so by necessity p°-! | f(x-) as well. Now recall Proposition 5.2.6, where we 


e—1 
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are allowed to cancel a nonzero divisor from “all three sides” of a congruence. 
Then we have that 
= f(te- 
Flve)/p? = FED 5 #1(x54) - (mod 9) 
but the right-hand expression is divisible by p by our original hypothesis, so 
f (xe)/p®-! = 0 (mod p). Using Proposition 5.2.6 again we multiply everything 
by p®! and obtain 
f(a) = 0 (mod p*) 


as desired. | 


Historical remark 7.2.4 Hensel’s Lemma. The German mathemati- 
cian Kurt Hensel was a grandson of the famous pianist and composer Fanny 
Mendelssohn; he was apparently the first one to use the term Fermat’s Little 
Theorem for the result we will see in Theorem 7.5.3. The lemma as presented 
here is only the finite case of his use of it to develop the p-adic numbers, which 
one may think of power series expansions of modular arithmetic. See [E.2.15] 
for a good project introducing them. 

Let’s use Hensel’s Lemma to take solutions to z? +1 = 0 (mod 5) and turn 
them into solutions modulo 25 and 125. By inspection, the solutions to this 
congruence modulo 5 are [2], [3] (or [2]). 


Example 7.2.5 First let’s tackle 2? + 1 = 0 (mod 25). By the preceding 
remark and the lemma, solutions modulo 25 will look like 3+k-5 or 2+k-5. 
Further, f’(#) = 2a, so for either solution modulo 5 the technical derivative 
condition is met. 

Let x; = 3. Then the condition for k is 


Fey 4. (221) = 0 (mod 5) 


which simplifies to 2+ 6k = 0, which solves to k = —2 = 3. Then our solution 
to the congruence modulo 25 would be 


t= 2%, +3-5=343-5= 18 (mod 25) 


And indeed 18? + 1 = 325 is divisible by twenty-five. 
Now try the same procedure with x; = 2 to get the solution x2 = 7 in 
Exercise 7.7.3. (If you get stuck, see Example 16.1.3.) 


Example 7.2.6 We can try the same process with e = 3. Taking (from the 
previous example, or the affiliated exercise) 22 = 7 yields, as a condition for k, 


+1 
25 
This reduces to 14k = —2 (mod 5), which gives k = 2. Indeed, 


+ 2-7k =0 (mod 5). 


g3 = 09 +2-57=74+2-57 =57 


yields 
577 + 1 = 3250 = 0 (mod 125). 


It’s good practice to try the same process with x; = 18 instead in Exercise 7.7.3. 


1One way or another one of these series will have to enter in, unfortunately; [E.2.1, Sec- 
tion 4.3] has more of a binomial theorem-esque treatment, while [E.2.13, Theorem 4.7] and 
[{E.5.1, Theorem 6.2] more explicitly invoke Taylor series. 
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This is a very powerful technique. What is most interesting is that this is 
even interpretable as Newton’s method in calculus. How? Note that the result 
above can be rearranged as 


f (@e-1) 


f'(@e-1) 


since p®! | f(ae_1) and the technical condition is tantamount to saying 
f'(@e-1) has an inverse. 


Le = Le-1 — 


Remark 7.2.7 In Newton’s method, if the derivative is zero our search for a 
solution fails. However, given the overall hypotheses of Hensel’s Lemma where 
instead gced(p, f’(ae-1)) # 1, finding some information is possible. 

Examining the proof of Theorem 7.2.3, one can see that if p | f’(ae_1), the 


congruence aaah +k- f'(ae-1) = 0 (mod p) can still have a solution if (and 
only if) p | Mee) as well (that is, if p° | f(ve_1)). In fact, in this case it 


e-1 


doesn’t matter what k is! So all numbers of the form x, = re_1 + kp work. 


If you didn’t notice this calculus connection, don’t feel bad! When we had 
the linear congruence f(x) = 4% —1 in Examples 6.5.3 and 6.5.4, the derivative 
was just f’(a) = 4 and it was not at all obvious that anything more than a trick 
was involved. Still, it’s another fascinating place where ideas from calculus can 
invade the world of number theory. 


7.3 Congruences as Solutions to Congruences 


We need to start applying these ideas more. In Section 7.1 we explored the 
number of solutions to x” — 1 = 0 (mod n) for arbitrary n. It should be clear 
we expect at least two solutions once we move past the trivial case n = 2, but 
why are there sometimes more? 

Could we ever get a comprehensible answer to that question? Online, try 
the following interact to see if you find any patterns. 


@interact 
def _(n=(12,[10..110])): 
counter = Q 
pretty_print (html ("Values _ofi$x*2-1$.mod.%s"%(n,))) 
pretty_print(html("<ul>")) 
for m in [@..n]: 
pretty_print (html (r"<li>$%s*2-1\equivi%s\text{_(mod. 
}%S)$</LIi>"%(m,mod(m,n)*2-1,n))) 
if mod(m,n)*2-1==0: 
counter += 1 
pretty_print (html("</ul>")) 
pretty_print(html(r"There_are_$%s$_solutions tow 
$x*2-1\equiv.0$_(mod_$%s$)."%C counter ,n))) 


Since x? —1 is a polynomial, our knowledge of Fact 7.2.2 suggests we should 
try to answer this by looking at different prime power moduli first, then mul- 
tiply the answers. 

The key idea we will use is this. For a prime p, 


p|a*—1=(#£—1)(4+1) implies e = +1 (mod p). 


More generally, p® | (a —1)(a+1) implies p divides x—1 or +1. So we should 
just look at various p*°. 
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If p is odd (and hence greater than two), the two possibilities p | x — 1 
and p | « + 1 are mutually exclusive, so all the factors of p in p* divide the 
same factor of 2? — 1. So p® | (aw +1) or p® | (x — 1) are the only possibilities 
(a = [+1]) and there are two solutions. 

However, if p = 2 then simultaneously having 2 | « — 1 and 2 | 4 +1 is 
definitely possible, so there could be more than two solutions. We examine 
three cases. 


¢ We know that +1 are still the only solutions modulo 2? and 2!. In the 
latter case +1 = —1, so then there is actually only one solution. 


¢ However, modulo 2° it’s possible that 2 | (2 +1) and 2? | (2 — 1), or vice 
versa, so that 2? + 1 = 3,5 are also solutions to the congruence. 


e When the modulus is a higher power of 2 this sort of thing can happen, 
too. For instance, when e = 5 one could have 2 | (x + 1) and 2* | (x —1). 
However, it’s not possible that 2? | (2 +1) and 2? | (a — 1) because 
numbers two apart can’t both be divisible by four. So the only other 
possibility is that 2 | (a +1) and 2° | (2 — 1), or vice versa, which is a 
total of four solutions. (See Exercise 7.7.15 to confirm these do all give 
solutions.) 


That means we get a very intriguing answer. 
Fact 7.3.1 Let k be the number of different odd primes that divide n. Consider 


the congruence x? —1=0 (mod n). Then: 
¢ There are 2" solutions if n is odd. 
¢ There are 1-2* = 2* solutions if n is divisible by 2 but not by 4. 
e There are 2-2 = 2**1 solutions if n is divisible by 4 but not by 8. 


e There are 4-2* = 2**? solutions if n is divisible by 8. 
Proof. Use Fact 7.2.2 and the argument above. a 
What does this have to do with the title of this section? Let’s recast the 
result. 


Fact 7.3.2 We can list all possible solutions to x? —1=0 (mod n) based on 
k, the number of odd primes that divide n, and based on the equivalence class 
of n modulo 8. 


¢ There are 2* solutions ifn =1 (mod 2), or when n= 1,3,5,7 (mod 8). 
¢ There are 2" solutions ifn =2 (mod 4), or when n = 2,6 (mod 8). 
e There are 2-2" = 2*+! solutions if n = 4 (mod 8). 


¢ There are 4-2" = 2**? solutions ifn = 0 (mod 8). 


This is only the first of many such results. 


7.4 Polynomials and Lagrange’s Theorem 


We’ve seen several times in this chapter that although one can have theorems 
of various kinds for congruences, polynomials seems to behave very nicely — 
even to the point of allowing us to prove statements about the integer output 
of polynomials! 
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At the same time, it’s clear that for good behavior, there is no substitute 
for prime moduli; the results in the previous sections really confirm this. So 
how can we combine polynomials and prime modulus? 

The answer was given by Joseph-Louis Lagrange in the next theorem. The 
proof proceeds via induction on the degree d of the polynomial. It is fairly 
detailed?, so feel free to try it out with specific numbers. 


Theorem 7.4.1 Lagrange’s Theorem for Polynomials. If p is prime 
and f(x) is a degree d, integer coefficient, non-trivial polynomial (i.e. f not 
identically zero or with all coefficients divisible by p), then there are at most d 
congruence classes of solutions of f(x) = 0 modulo p. 

Proof. First, consider the case where there are no solutions to f(z) = 0 (mod 
p). Then there is nothing further to prove, since 0 < d for any polynomial. 
This actually proves a base case, for if the degree is d = 0 then f(x) = c for 
c #0. (If c= 0 we have the trivial polynomial, which is the excluded case.) 
For another base case, suppose that the degree d = 1. Then we have az+b = 0 
(mod p), which is the same as az = —b (mod p). In this case gcd(a, p) = 1 and 
there is exactly one solution by Proposition 5.1.3 (if az + 6 is actually going to 
have a linear term, otherwise p | a). 

Now we'll use some induction. Let’s assume that all polynomials with degree 
e less than d have at most e solutions modulo p, and try to examine a generic 
polynomial f of degree d: 


d-1 


f(z) = age? + Gg_12 +--+ tar + ao. 


We already dealt with the case where f has no solutions, so assume further that 
f(b) = 0 (mod p) for at least one congruence class [b]. Consider the following 
expansion of f(a) — f(b): 


f(x) — f(b) = f(@) = 
(agx" +dq_ue tb +--+ tae t+ ao) _ (aab4 + dq_yoe 1 +--+» +aybt+ ao) = 
Gd («4 = b*) + dq—1 (a-* ptt) f+-++a,(x —b) 
Now recall the factorization 


(a® — b*) = (a —b) (x1 +--+. + 0%"). 


Apply it to the previous formula to factor our x — b: 
(az — b)- (A bunch of other Stuff) . 


Note that “Stuff” is strictly of degree less than d. 
Now we can write f(x) = 0 in two ways, recalling that f(b) = 0: 


e f(x) = f(x) — f(b) = (a — b) - Stuff(a) 


Therefore 


f(x) = (a — b) - Stuff(x) = 0 (mod p) 


implies that p divides the product of x — b and the stuff. Crucially, by 
Lemma 6.3.6 we know p divides one of these two factors. 

Since the “Stuff” function must be a polynomial of degree less than d, there 
are at most d—1 solutions to it modulo p if p divides “Stuff”. If p divides x — b 


? And pieces are independently useful. The factorization of «* —b* could prove Fact 4.2.3; 
see also Exercise 7.7.6. 
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instead, that is only one more solution for f(x), so there are a total of at most 
d solutions available for f(x), including x = b. 

Finally, f(x) was an arbitrary polynomial of degree d, so the induction state- 
ment is proved, and by induction, the theorem works for any non-trivial poly- 
nomial. | 

We just saw Theorem 7.4.1 isn’t true for general moduli. For example, in 
Fact 7.3.1 we got as many as 2*+? solutions to 2? — 1 = 0 for moduli that 
looked like 8p, p2---p,. We would expect only two with Lagrange’s Theorem 
for Polynomials! 

But there cannot be more than two solutions to the 2? +1 problems modulo 
a prime. If we find two solutions, we have all of them. This proves to be 
quite useful to keep things from going crazy when we are trying to investigate 
congruences; if we keep the modulus prime, we will be okay. 

Of course, not every polynomial has the full number of solutions that The- 
orem 7.4.1 allows; consider 7” = 0 (mod p). We might not even get two 
in interesting instances of a quadratic polynomial; for example, x? + 1 = 0 
doesn’t have a solution modulo three (just try all three options to check). The 
following interact investigates this a bit more. 


@interact 
def _(n=(13, prime_range(10Q))): 
counter = Q 
pretty_print (html("Zero_values_of.$x*2+1$_mod_%s"%(n,))) 
pretty_print(html("<ul>")) 
for m in [@..n-1]: 
if mod(m,n)*2+1==0: 
pretty_print (html (r"<li>$%s*2+1\equivi%s\text{_ 
(mod. }%s)$</lLi>"%(m, mod(m,n)*2+1,n))) 
counter += 1 
pretty_print (html("</ul>")) 
pretty_print(html(r"There_are_$%s$_solutions tow 
$x *2+1\equiv_0$_(mod_$%s$)."%( counter ,n))) 


Maybe it’s not so surprising that sometimes x2? + 1 = 0 has no solutions, 
since x? + 1 = 0 doesn’t have any real solutions either. Could there be connec- 
tions or parallels between these cases? 


7.5 Wilson’s Theorem and Fermat’s Theorem 


Polynomials aren’t the only types of formulas we will see. Here, we introduce 
two famous theorems about other types of congruences modulo p (a prime) 
that will come in very handy in the future. 


7.5.1 Wilson’s Theorem 


Theorem 7.5.1 Wilson’s Theorem. [fp is a prime, then 
(p — 1)! =—1 (mod p), 


where the exclamation point here indicates the factorial. 

Proof. If p = 2 this is very, very easy to check. So assume p # 2, hence p — 1 
is even. Now we will think of all the numbers from 1 to p— 1, which will be 
multiplied to make the factorial. (We will put the example p = 11 in bullets 
to help follow.) 

For each n such that 1 < n < p—1, we know that n has a unique inverse 
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modulo p. Pair up all the numbers between (not including) 1 and p—1 in this 
manner. 


e If p=11, we pair up (2,6), (3,4), (5,9), and (7,8). 


Then multiplying out (p—1) factorial, we can reorder the terms using the pairs, 
and notice much cancellation: 


(p— 1)! =1-2-3---(p—2)-(p—1) =1-a-a7!-b-b-1-+- (p—1) 


=1-1-1---1-(—1) = (p— 1) = -1 (mod p) 
e For instance, if p = 11, we pair up 
10!=1-2---9-10=1-(2-6)- (3-4)-(5-9)-(7-8)-10 
which simplifies to 


10! =1-1-1-1-—1 (mod p) 


Beautiful! 
The only loose end is that perhaps some number pairs up with itself, which 
would mess up that all the numbers pair off nicely. However, in that case, a? = 
(mod jp), so by definition p | (a —1)(a+1); since p is a prime greater than two, 
it must divide one (and only one) of these factors (recall Lemma 6.3.6). In 
these cases a = 1 or a=p-—1. But we were not pairing off 1 or p — 1, so this 
can’t happen. | 
Exercise 7.7.7 is to show that the conclusion of Wilson’s theorem fails for 
p = 10. That is, that (10 — 1)! # —1 (mod 10). So does it work or not for 
other moduli? 


@interact 
def _(n=range_slider(2,100,1,(3,9))): 
for modulus in [n[0]..n[1]]: 
pretty_print (html (r"$(%s-1) !\equiv.%s$_ (mod. 
$%s$)"% (modulus, 
mod( factorial (modulus~-1),modulus), modulus))) 


Remark 7.5.2 See Exercise 7.7.11 once you have explored this for a while. 
The first known modern proofs were by Lagrange, but (while solving a general 
system of linear congruences, see [E.7.47]) Ibn Al-Haytham had already enun- 
ciated this statement by the eleventh century! For nice combinatorial proofs, 
see Subsection 7.8.2 or [E.7.27]. If you are really curious, see Wikipedia? or 
Alexander Walker’s blog*+ for a generalization due to Gauss; a somewhat dif- 
ferent approach to generalization is taken in [E.7.28]. 


7.5.2 Fermat’s Little Theorem 


If one explores a little with powers of numbers modulo p a prime, one usually 
notices some pattern of those powers. This is the best-known, and soon we'll 
reinterpret it in a powerful way. 


3en.wikipedia. org/wiki/Wilson%27s_theorem#Gauss%27s_generalization 
4awwalker.com/2017/02/05/a-generalization-of-wilsons-theorem-due-to-gauss/ 
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Theorem 7.5.3 Fermat’s Little Theorem. Jf gcd(a,p) =1 for p a prime, 
then 

a?~1 =1 (mod p). 
Proof. Sketch of proof (to fill in, see Exercise 7.7.10): 


¢ If gcd(a,p) = 1 and p is prime, show that {a,2a,3a,...,(p— 1)a, pa} is 
a complete residue system (mod p). 


o That is, show that the set {[a], [2a], [3a],..., [pa]} is the same as the 
complete set of residues {[0}, [1], [2],...,[p — 1]}, though possibly in 
a different order. 


e If pis prime and p does not divide a, show that 


a: 2a-3a---(p—1)a=1-2-3---(p—1) (mod p). 


e Now use Wilson’s Theorem and multiply by —1. 


| 

Like with most important theorems, there are many other ways to prove 
it as well; in Section 7.8 we’ll provide a counting-based proof. See [E.7.40] for 
an analysis of interrelationship with a focus on mechanizing proof. We'll see a 
more abstract approach after we introduce the concept of groups in Chapter 8; 
see Exercise 9.6.2. 

So despite the innocuous appearance of this result as a corollary of another 
theorem, do not be fooled; it is incredibly powerful. As an example, it pro- 
vides the primary tool in Fermat’s proof that 2°” — 1 is not prime®; imagine 
discovering this factorization by hand! 


print (2%37-1) 
print (factor (2%37-1)) 


137438953471 
223 * 616318177 


7.6 Epilogue: Why Congruences Matter 


Although we will spend some significant time working on solving congruences, 
I don’t want to lose sight of deeper questions. To see how congruences help 
address them, recall the search in Section 7.1 for primes p such that 


a? = —1 (mod p) 


has a solution. The table given by the following interact is organized a little 
more; if online, try to find a pattern in which p yield solutions and which do 
not. 


import itertools 


@interact 
def _(n=20): 
yeslist=[] 
nolist=[] 
for p in prime_range(3,n): 
res = Q 


5For more on this story see [E.5.8, page 57]; for more on this type of number see Defini- 
tion 12.1.6. 
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for res in [0..p]: 
if mod(res,p)*2+1 == Q: 
yeslist.append(p) 
break 
else: 
nolist.append(p) 
t = [L'exist', 'do not exist']] + [[a,b] for (a,b) in 
itertools.zip_longest(yeslist ,nolist) ] 
for item in t: 
for i in range(len(item)): 
if itemli] is None: 
item[iJ='' 
pretty_print (html (r"Solutions. to i$x*2\equiv.-1$.(mod_ 
$p$)_for_$2\le_pi\lei%s$:"%n) ) 
pretty_print(html(table(t, header_row = True, frame = 
True))) 


Question 7.6.1 Do you see a pattern related to some kind of congruence? 
(This one should be more apparent than in Section 7.3; see also Exercise 7.7.12.) 


The reason I point this kind of thing out is not just because I can, but 
because it shows simple congruence patterns can have a big result. We will 
prove a result about integers, assuming something about congruences. 

Recall our brief search through Mordell/Bachet curves in Section 3.5. Let’s 
look at the particular case 2° = y? — 7. 


is 


Figure 7.6.2 Solutions of 2° = y? — 7 in several viewing windows 


It’s amazing how the curve slips between every integer lattice point... So it 
seems that a perfect square can’t ever be exactly seven more than a perfect 
cube. Is this true? Here’s where congruences come into play. 


Proposition 7.6.3 Showing a Mordell curve has no integer point. 
There are no integers x,y such that x? = y? —7, so there are no integer points 
on this curve. 

As a prefatory note, this proof will depend upon the results of our ex- 
ploration at the beginning of this section®. We will eventually prove these 


conjectures in Fact 13.3.2, which will allow us to claim full proof of this state- 


6Davenport in [E.4.10] and Conrad credit this proof to the same Lebésgue mentioned in 
the rediscovery of Qin’s generalized Chinese Remainder Theorem in Subsection 5.5.1. 
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ment in Fact 15.3.3. However, you may want to try to find an “elementary’ 
proof of the conjecture in Exercise 7.7.12. 

Proof of Proposition 7.6.3. For convenience, we will rewrite 2? = y? — 7 as 
y? = 2° +7. To begin the proof, first consider the case where x is even. Then 
2|2, so 8| 2°. That means y? = 7 (mod 8). 


[i*2 for i in Integers(8)] 


[@, 1, 4, 1, ®, 1, 4, 1] 


Unfortunately, the only perfect squares mod (8) seem to be 0, 1, and 4. So this 
is not possible. 

What about if x is odd? Then y must be even, since x? and 7 are odd. So let’s 
examine whether 7 = 1 (mod 4) or x = 3 (mod 4), the next two options. 


¢ If c =3 (mod 4), then x? = 27 = 3 (mod 4), so x3 +7 = 10 = 2 (mod 
4). But we already know from earlier that perfect squares are only 0 or 
1 modulo 4, so that’s not possible. 


e So it must be the case that « = 1 (mod 4). 
Now we do a trick like that of completing the square: 
y=artToytl=2+8S>y'°4+1=(¢+2)(2? —274+4) 
Let’s analyze this carefully in the following argument. 
e Ifx=1 (mod 4), then x + 2 =3 (mod 4). 


e So not only is x + 2 an odd number, but also it must be divisible by a 
prime q of the form 4n +3. (Otherwise all its primes look like 4n+1 = 1, 
the product of which would also be = 1 (mod 4).) 


¢ If q divides x + 2, it (naturally) divides (a + 2)(x? — 2a + 4) as well. But 
if it divides (a + 2)(a? — 2x +4), it must then divide y? + 1, since they’re 
equal. 


e However, our exploration at the start of this section suggested that a 
prime of the form 4n + 3 can’t divide y? + 1! 


e So, assuming it is true that only primes of the form 4n + 1 can divide 
perfect squares plus one (y? +1), then x = 1 (mod 4) doesn’t work either. 


a 
Enough said; congruences are amazingly powerful. 


7.7 Exercises 


1. Before reading beyond Section 7.1, pick one of these, and really do some 
exploration and write about it. See Section 7.6 for another interactive 
applet for the first question. 


e Do exploration to try to find a criterion for which primes p there 
are square roots of —1. You will have to examine primes less than 
10 by hand to make sure you are right! 


e Do exploration to find out anything you can about how many square 
roots of 1 there are for a given n. 
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2. Figure out how many solutions x? = x (mod n) has for n = 5,6,7, and 
then compute how many solutions there are modulo 210. 


3. Finish finding the solutions to the congruences in Examples 7.2.5—7.2.6. 
Do you notice anything about the answers that suggests a shortcut for 
finding these particular additional solutions? 


4. Find all solutions to x? + 8 = 0 (mod 121) using the method above in 
Theorem 7.2.3. 


5. Solve f(x) = 2? — x? +2xr+1=0 (mod 5°) for e = 1,2,3. 


6. Use summation notation to properly prove 


(a _ b*) = (a — b) (c*1 + bak? 40.64 pet) ; 
7. Show that the conclusion of Wilson’s Theorem fails for p = 10, and check 
that it holds for p = 11 by computing 10! and then reducing. 


8. Suppose we have the same setup as in Wilson’s Theorem, modulo a prime 
p. What is the value of (p — 2)! as a function of the modulus? 


9. Use Fermat’s Little Theorem to help you calculate each of the following 
very quickly: 


¢ 51237? (mod 13) 
© 34449783 (mod 17) 


e 1234°° (mod 23) 
10. Prove Fermat’s Little Theorem using the steps in Theorem 7.5.3 (a stan- 
dard one in many texts), or any way you would like. 


11. Prove that Wilson’s Theorem always fails if the modulus is not prime. 
Hint: use the fact that the modulus n then has factors m other than 1 or 
n. 

12. Prove that it is impossible for p | x? +1 if a prime p has p = 3 (mod 4) — 
that is, if p is of the form 4n + 3. (Hard’.) 

13. Prove that 2? + y? = p has no (integer) solutions for prime p with that 
same form 4n + 3. 

14. Show that y? = x? +999 has no (integer) solutions (See [E.2.13, Chapter 
10 Review Exercise 5], Exercise 15.7.7). You may assume Fact 13.3.2. 


15. In solving x? — 1 = 0 (mod 2°) for e > 3 for Fact 7.3.1, find the exact 
form of the two solutions other than +1. 


7.8 Counting Proofs of Congruences 


Some number theoretic results require essentially no number theory for their 
proof, but may be tackled using basic ideas from combinatorics, the discipline 
of counting well. The essential idea in all of these types of proofs is to find 
two (or more) ways to count something you care about; with skill (or luck), 
equating these will lead to an algebraic formula that might be quite challenging 
to verify with mere manipulation. Although in this text we do not really 
address partitions, additive number theory, or other beautiful combinatorial 
elements of the discipline, it is worth showing two classic proofs, by counting 
pictures, of the classic theorems in Section 7.5. Both appear in [E.2.11], where 


“If you absolutely must know, see [E.2.13, Theorem 4.12] or [E.5.1, Theorem 8.6] for a 
somewhat more general statement proved using Fermat’s Little Theorem, which [E.2.13] later 
uses to prove Proposition 7.6.3. 
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I learned of them, though they are both significantly older. In this section, I 
will try to put them in a unified context in an attempt to lend insight. 


7.8.1 Counting motivation 


In both cases we will have a natural question about objects situated on a 
circle, which may be naturally rotated by 27/n radian (or 360/n degrees). 
Since such an object will certainly look the same after doing this rotation n 
times (27/360°), we can call this a n-action, and call the initial rotation a basic 
n-rotation®. 

In particular, we will want to look at classes of such objects that share 
some obvious similarity when rotated in this fashion. As an example, consider 
configurations of n equally spaced points around a circle, two separate pairs 
of which are connected by a line segment. You can think of this as ways of 
cutting a round birthday cake, using two cuts going between n equally-spaced 
candles along the edge?. Figure 7.8.1 shows a few examples for n = 7; notice 
how the two cuts on the left are rotations of each other, while the others clearly 


ao 
OO 


Figure 7.8.1 Several configurations of two lines on a 7-point circle 


Now consider any object of this type, and suppose the object looks the 
same after some smallest nonzero number k < n of basic n-rotations!®. Then 
of course that will still be the case after another k of them, and so forth for 
any multiple km of k. But we also know that after n basic n-rotations the 
object is the same. 

Now use the Division Algorithm. We have that n = kq +r for some 
0<r<_k. Since we just noted the object is the same after kq basic n- 
rotations, then applying just r of them must also bring it back to its original 


8This notation is not standard, and is only for use in this section. 
81 don’t recommend using this at an actual party, since the three or four pieces will likely 
be quite unequal in size and shape. 
10That such a number exists is guaranteed by the Well-Ordering Principle as usual. 
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configuration as well — except we said r < k, which is impossible by hypothesis 
unless r = 0. So k must be a proper divisor of n. 

That this is possible can be noted in Figure 7.8.2, where the left-hand cuts 
would now be preserved by a mere k = 3, not only n = 6, basic 6-rotations. 


OO & 
OO 


Figure 7.8.2 Several configurations of two lines on a 6-point circle 


But when n = p is prime, there are no proper divisors except 1 itself! So 
the only configurations which could be rotated non-trivially would be ones that 
are identical under any number of basic p-rotations at all. 

Finally, that means that all the configurations which cannot be rotated non- 
trivially must generate p different configurations when rotated — configurations 
which are necessarily different from any others’ rotations, so that they partition 
the set of all the configurations. This yields the key fact we will use in both 
proofs. 


Fact 7.8.3 If p is prime, then the set of configurations which change non- 
trivially under a basic p-rotation can be divided into subsets, each of size p. So 
p divides the size of this set. 


7.8.2 The combinatorial proofs 


Solomon Golomb provided the following creative proof of Fermat’s Little The- 
orem as a classroom note'! in [E.7.41]. The proof is of the statement in the 
form we will see later in Exercise 9.6.3: 


a? = a (mod p). 


It has been reused in many texts and spread throughout the internet as the 
‘pearl’ or ‘necklace’ proof!”. 


Thanks to JSTOR, you can access the original publicly at www. jstor.org/stable/ 
2309563. On a side note, it is amazing today to think that a professor at MIT was the 
editor of the classroom notes section of the Monthly in 1956. Times have changed. 

121¢ is worth noting that the case n = 2 may be thought of stating a fact about musi- 
cal chords on a p-note scale. See Hook’s excellent introduction (search www.mtosmt.org for 
‘tetrachord hook’) if you know a little group theory. 
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Combinatorial proof of Theorem 7.5.3. Suppose that at each of p equally spaced 
points around a circle we have a different color bead, with a colors available. 
“Since each of the beads can be chosen” in a different ways, there are a? possible 
colorings. 

However, if we use only one color of bead, then that coloring doesn’t change 
under a basic p-rotation. So the total number of relevant configurations in 
Fact 7.8.3 is a? — a, which implies that p | a? — a or 


a? = a (mod p). 


a 
Over a century ago, Robert Carmichael (whom we will meet again in Defin- 
ition 12.2.9) gave the following very interesting proof!?. As Golomb points out 
in his article, it is of a very similar nature to the previous one, which motivates 
the unified presentation here‘. 
Combinatorial proof of Theorem 7.5.1. We start with Carmichael’s introduc- 
tion. 


Let p points be distributed at equal intervals on the circumference 
of a circle. The whole number of p-gons which can be formed by 
joining up these p points in every possible order is evidently 


1 
—p(p—1)---3-2-1. 
2p 


Indeed, if we start by picking one of p starting points on the circle, then there 
are p! ways to join the rest in some order, but we then need to divide by the 
number of starting points of such a configuration, as well as the two directions 
we could have chosen to start. Further, to use Fact 7.8.3 we need to subtract 
the ones like the one on the left in Figure 7.8.4, of which there are po since 
from a starting point that is the number of distances (right or left) one can 
go to the next point (and from then on it continues identically so that any 
rotation will keep it unchanged). 


Figure 7.8.4 Connecting 7 points on a circle various ways 


So we have that 


ppP(P  1(p—2)-+-8-2-1~ 5(p— 1) =0 (mod p) 


13The textbook it occurs in is now available freely via Project Gutenberg of Wilson’s 
Theorem. 

14Unbelievably, this proof also has connections to music theory, namely enumerating tone 
rows for an n-note scale. See [E.7.46] (or at www. jstor.org/stable/3647771). 
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multiplying by two and simplifying yields 
(p— 1) —p-+ 1 = 0 (mod p) 
which immediately implies 
(p — 1)! = —1 (mod p) 


as desired. a 


Remark 7.8.5 For those who know a little graph theory, this proof may be 
streamlined. The number of directed cyclic graphs on these points is (p — 1)!, 
and similarly there would be exactly p—1 directed cyclic graphs which remain 
unchanged under a basic p-rotation. 


Remark 7.8.6 As a note to instructors, though we do not define group actions 
in this text, of course a n-action is really a Z,-action, using the terminology 
of Definition 8.1.1. Indeed, these computations are all just special cases of the 
Burnside Lemma/Cauchy-Frobenius Theorem’, but without the annoyance of 
having to actually compute very many fixed points, and without the bother of 
determining the number of orbits. 


These are certainly not the only combinatorial proofs of congruences. See 
[E.7.27| for a recent proof of Wilson’s Theorem using two different ways of 
counting the functions of the set {1,2,...,p} onto itself. Like many presenta- 
tions of these two theorems, it uses Fermat’s Little Theorem to prove Wilson’s 
Theorem, rather than the other way around as we did it in Section 7.5. 

Golomb only asks what we (and [E.2.11]) show explicitly; where did we use 
that p is prime? It is of course in the division algorithm, when finding how 
many basic n-rotations suffice to preserve the figure. The beauty of the proofs 
in this section is that they rely directly only on the division algorithm and 
primality, nothing more. 


Summary: First Steps With General Congruences 


Although we cannot as easily fully solve more general congruences than 
linear ones, there are many useful and elementary results to explore. 


1. As a prelude, we explore Question 7.1.1 about when we have square roots 
of +1, modulo n. 


2. Can we use some of the methods from linear congruences for polynomials? 


e We can combine solutions to polynomials in a similar way to the 
Chinese Remainder Theorem (Fact 7.2.2). 


e In Hensel’s Lemma we see how to use a solution modulo a prime 
power to create a solution modulo a higher power of the same prime. 


3. A key approach in solving congruences is to remember that the nature of 
the solutions may also be expressed in terms of a congruence. Fact 7.3.2 
is a first good example of this, giving a complete analysis of square roots 
of one. 


4. We next see in Lagrange’s Theorem for Polynomials that when our mod- 
ulus is prime, solutions of polynomials are limited more closely by our 
previous experience. 


1mathworld.wolfram.com/Cauchy-FrobeniusLemma. html 
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5. Two towering theorems giving theoretical tools to harness more complex 
congruences are Wilson’s Theorem and Fermat’s Little Theorem. 


6. Finally, we explore Mordell curves again in an effort to motivate a deeper 
understanding of Epilogue: Why Congruences Matter. 


The Exercises focus on polynomial congruences, but include a little practice 
of Fermat’s Little Theorem. After this we have alternate combinatorial proofs 
provided of Fermat’s Little Theorem and Wilson’s Theorem; see especially the 
section on Counting motivation. 
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Chapter 8 


The Group of Integers Mod- 
ulo n 


This chapter does not do any number theory, per se. Yet it is at the heart of 
the text. We introduce two powerful methods to deal with integers modulo n 
— visualizing them graphically, and the language of group theory. 

There is no prerequisite in either case; do not feel worried if you have not 
encountered algebraic structures like groups before. We will only take and 
introduce what we need, and refer back to fundamental properties often. 


8.1 The Integers Modulo n 


8.1.1 Definition 


It is time for us to finally define what we have been working with for quite a 
while now. 


Definition 8.1.1 Integers Modulo n. For a positive integer n, the set of 
equivalence classes of integers modulo n is called the integers modulo n. We 
denote it Z,. That is, 


Zn = {[0], [1], [2], pt. 3 [n = 2], [n ~ it 


In the case where n = p is a prime, we usually write Z,. (For those who have 
had an abstract algebra course, this may be different notation than you have 
used, but we will consistently use this one.) © 

This friendly number system will become a good acquaintance, if not friend, 
throughout the rest of the course. We’ll explore it soon, but first let’s see some 
of the basic properties. 

As it turns out, Z, has several very interesting properties. Like all of our 
number systems in this class, you can add and multiply elements of Z,, (we 
call something like that a ring). This is true because of our earlier proof of 
well-definedness for addition and multiplication in Proposition 4.3.2. 

As a first step in visualizing, we can make an addition table. (See Fig- 
ure 8.1.2 or the interact after it.) This is not very interesting. But in some 
sense, it is interesting that it isn’t interesting. Does that make any sense? 
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+ [fo fy 2 
() | fo) PI 
1] fi) 2) [0 
2); 2) fo) WW 


Figure 8.1.2 Addition table for Zs 


@interact 
def addition_table_(n=(11,([2..50])): 
P=[[Lmod(a,n)+mod(b,n) for a in [0..n-1]] for b in 
[@..n-1]] 
pretty_print (html ("The_addition_table_for modulus. 
$%s$"%(n,))) 
pretty_print(html(table(P, header_row = True, frame = 
True))) 


The top row and left column may be considered as a list of a and b. Any 
ideas about patterns here? 


It’s also possible to make a multiplication table. (See Figure 8.1.3 or the 
interact after it.) This makes things a little more interesting. 
x | 2 
[0] | [0] [0] (0) 
[1] | fo] fl) PI 
[2] | (0) (2) 


Figure 8.1.3 Multiplication table for Z3 


@interact 
def _(n=(11,[2..50])): 
P=[[mod(a,n)*mod(b,n) for a in [@..n-1]] for b in 
[@..n-1]] 
pretty_print(html("The_multiplication_table_for modulus, 
$%s$"%(n,))) 
pretty_print(html(table(P, frame=True))) 


Again, notice that the columns and rows are both from 0 to n — 1; this is 
standard. For now we’ll usually just use the set of least nonnegative residues 
to represent Z,; recall that this is {[0], [1], [2],...,[n — 2], [n — 1]}. 

Are there any patterns you notice here? 

There is at least one observation that is curious. For some moduli, the only 
zeros are where we expect them, in the top row and left column. For others, 
they are in other spots. 


8.1.2 Visualization 


What’s even better is to see this visually! I still can’t get over how easy it is 
for me to do this in Sage (and other math programs), such as in the following 
graphic and interact. It is so cool that my (non-mathematician) wife says, 
“What’s that — it’s neat!” I wish more people could experience this joy of 
beauty in math. 
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Figure 8.1.4 Colored multiplication table for n = 7 


@interact 
def multiplication_table_plot (n=(7,[2..50])): 
P=matrix_plot(matrix(n,[mod(a,n)*mod(b,n) for a in 
srange(n) for b in srange(n)]),cmap='jet') 
show(P, figsize=7) 


How does one interpret this graphic? The a row and 6 column give the 
color corresponding to a:b (mod p). That means the first (Oth) column is the 
color for a-0 = 0 and the second (1st) column gives the colors of each element 
a:1=aof Z,. Since zero times anything is zero, that gives us a lot of that 
color (deep blue in the default) along two edges. 

Can you see the difference between prime and composite moduli better 
now? 


8.1.3 Inverses 


Let’s focus on the tables/graphs for when n = p a prime. There’s at least one 
interesting observation we can make about them. Every row and every column 
(other than the ones corresponding to 0) has the entry 1 in it. (That’s the 
deepest nonzero blue in the default coloring.) 

You can’t necessarily say this about other numbers, so let’s translate this 
into notation. 


Fact 8.1.5 When p is prime, every nonzero element of Zp has an inverse. 
Proof. If gcd(a,n) = 1, then ax = b (mod n) has a unique solution in Z,. So 
if n = p is prime, then gcd(a, p) = 1 always, except for a = 0. 
Now we let b = 1, and finding x becomes the same as finding the inverse number 
of a (recall Definition 5.3.4). So for prime moduli, every non-zero element has 
a unique inverse in Zp. | 
(In algebraic nomenclature, this means Z, is a field, yet another example 
of bizarre but fun math terminology.) 
What was the command again to get an inverse? 


inverse_mod (26,31) 
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It turns out there is an even easier way to get at this in Sage than the one 
I used last time! In retrospect, it makes sense. 


c = mod(26,31) 
eel 


c = mod(26, 31) 
Cxeo|| 


Go back to the graphics or tables. Can you “see” that there is (exactly one) 
inverse for every non-zero element of Z,? 


8.2 Powers 


Let’s continue to restrict ourselves to looking at Zp, the integers modulo some 
prime p, for a bit longer. This will enable us to get a little more detailed in our 
exploration. We eventually want to explore solutions to congruences modulo 
primes and prime powers. 

Let’s begin by exploring powers. Powers are particularly important, since 
polynomials are constructed from them. 

For instance, if a = 2 and p = 7, the powers of a begin with 0,2,4,1 
since 2° = 1 and 2? = 8=1. The following interact allows (not yet colored) 
exploration of many powers a” modulo p for various primes p and bases a. 


@interact(layout=[['p','a']]) 
def _(p=(7,prime_range(50)),a=(3,[0..50])): 
b=mod(a,p) 
top=ceil (2*p/10)*10 
pretty_print (html ("If _iwe_look_at_some_of the powers of. 
$%s$"%(a,))) 
pretty_print (html ("modulo the _prime_$%s$, weiget:"%(p,))) 
pretty_print(html("<ul>")) 
for m in [Q@..top]: 
pretty_print (html (r"<li>$%s*{%s }\ equivi%s\text{. (mod. 
3%s)$</li>"%(a,m,b*m,p))) 
pretty_print (html ("</ul>")) 


Do you see any patterns? It’s probably a little early to try to come up with 
potential theorems, but there should be at least some patterns you see. Do 
you maybe even see any theorems we have already proved in here? 

One of the biggest patterns is hard to see in this format, but is the simplest. 
Given a prime p, you should get get the same answers for a = a’ (mod p). 
(Recall this fact was the core of the proof of Fact 6.1.4.) So we should really 
just restrict ourselves to looking at 0 < a < p. 


8.2.1 Returning to visualizing 


Still, this is a lot of data to assimilate. Is there some way to think about it 
differently? 
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This next interact is super-cool, because it combines the short, color-coded 
format with the much less familiar material of powers. 


0 2 4 6 8 10 
n 1 n n 1 n 


Figure 8.2.1 Colored table of powers modulo n = 11 


import matplotlib.pyplot as plt 
from matplotlib.ticker import IndexLocator, FuncFormatter 
@interact 
def power_table_plot (p=(11,prime_range(100)[2:])): 
mycmap = plt.get_cmap('gist_earth',p-1) 
myloc = IndexLocator(floor(p/5) ,.5) 
myform = FuncFormatter(lambda x,y: int(xt+1)) 
cbaropts = { 'ticks':myloc, 'drawedges':True, 
"boundaries ':srange(.5,p+.5,1)} 
P=matrix_plot(matrix(p-1,[mod(a,p)*b for a in range(1,p) 
for b in srange(p)]), cmap=mycmap , colorbar=True, 
colorbar_options=cbaropts, ticks=[myloc,myloc], 
tick_formatter=[None,myform]) 
show(P, figsize=6) 


The default coloring needs some explanation, as they are not the same as in 
the previous example. The a row and b column gives the color corresponding 
to a? (mod p), where the colors are given by the colorbar on the right. From 
this we see that the first (Oth) column is all the color for a° = 1, and the second 
(1th) column gives the colors of each element a! = a of Z». For instance, since 
3+ = 4 (mod 7) in the initial example, it has the color of the color corresponding 
to 4. 

(As far as I know, this representation first appears in Wagon and Bressoud’s 
excellent computational number theory text [E.4.7]. The PascGalois! project 
has related visualizations. ) 


Sage note 8.2.2 Colorful options. If you don’t like the colors, you can 
change the word in the quotes in the command mycmap = plt.get_cmap(...) 
(currently 'gist_earth'); for instance, 'gray' gives a grayscale plot, which is 
most appropriate for certain vision-impaired users. Some others you could try 
are 'Oranges' or 'hsv' or ... Well, see the next Sage cell if you really want to 
know all of them! 


for c in colormaps: 
print(c) 


lwww. pascgalois.org/ 
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Blues 
BrBG 
BuGn 
tab2@b_r 
tab2@c_r 
What color patterns can you see here? To say it another way, what potential 
theorems do you see? (Again, do you see any that we already have discussed?) 
In aclassroom or self-study situation, I strongly recommend thinking about 
this until coming up with some nice potential theorem regarding whether there 
are any patterns in a? (mod p) that hold for all p or all a or all b, or some- 
thing. 


8.3 Essential Group Facts for Number Theory 


Many of the bookkeeping issues which arise in number theory can be made 
much easier by changing our language and introducing a small amount of ab- 
straction. That abstraction is the concept of group. These notes will introduce 
this concept in the most basic way possible, with only the minimum needed to 
translate many difficult arguments into simpler language. 


8.3.1 Step-by-step notions to the definition 


We will take an approach that starts with the familiar and adds properties 
until we reach our goal. 


8.3.1.1 Sets 


Sets are just what you think. They are collections of (mathematical) stuff. 

In our uses of groups, we will exclusively be concerned with sets that are 
collections of numbers, like P, the set of primes, and Z, the set of integers, 
or Zp, the set of equivalence classes of integers modulo n. But it’s helpful to 
think more generally. 


8.3.1.2 Binary operations 


A binary operation is a set with a multiplication table on it. That’s it. 
Usually books call it * or something like that, and then define a binary 
operation on the set S to be a function from S' x S to S. 


e Usually this would be (say) normal addition or multiplication on numbers, 
though it could also be subtraction. 


e On the other hand, if S$ is the set of continuous functions on R, the 
operation could be composition of functions, f 0 g. 


Notice that if our set is Q and our operation is division, we don’t have a 
full table. The essential thing is that it’s a set with a table or rule for the 
operation. 
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8.3.1.3 Closed operations 
A binary operation is called closed if you don’t get anything outside the set 


with your operation. This is important because it’s easy to break this. 
e If you are adding two positive numbers, for instance, you always get a 
new positive number. 


e Is this still true if you subtract two positive numbers from each other? 


e This also can happen with division, right? You have to look at Q, and 
then you have to be careful because of the previous point. 


e For a more complicated example, let S be the set of 2x2 matrices with 
determinant 1; if you add two of them, your determinant might change 
a lot. 


e On the other hand, if you multiply two such matrices, you’re golden; the 
determinant will still be 1. 


8.3.1.4 Associative operations 


An operation is associative if it doesn’t matter how you put parentheses in. 

This is not an algebra course, so I won’t harp on this — everything we do 
will satisfy it in obvious ways. But it’s worth noting that exponentiation is not 
associative, so it’s not a trivial condition. 


Example 8.3.1 


2(2°) = 28 = 256 but (22)8 = 43 = 64. 


8.3.1.5 Identity 
Much more important is whether your operation has an identity element. 
You have seen this many times before in addition and multiplication. 


a+0=a=O0O+aanda-:l=a=1-a. 


When we turn this into abstract math, we say that an identity for a general 
operation * on a set S is an element, conveniently called e, which has the very 
nice property that if you * by it, you get the same thing back. 


e That is,ex*a=a=a*x*e for any ae S. 
e The identity matrix under matrix multiplication is another example. 


e By the way, if there is an identity, there’s only one, which is sometimes 
useful to know. 
Example 8.3.2 Here is a more interesting example. Let your set be the set of 


all rotations of a square which leave it facing the same way. That is, rotation by 
90 degrees to the left, 180 degrees right, etc. (Think of a child’s block sorter.) 


¢ The binary operation combining two (possibly different) rotations would 
be to first do one rotation, and then the other one. 


e Then an identity element e of this is just to leave the block alone! 


This is sort of weird at first, but an extremely important example. 
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8.3.1.6 Inverses 


Almost there! Let’s keep thinking about that last example. Say I turn the 
block 90 degrees to the right, then I realize I made a horrible mistake and 
want to get back to the original position. Is there anything I can do, short of 
buying a new square block? 

Of course there is! Just turn it back 90 degrees to the left. So if I call the 
first move 90R and the second one 90L, I can say that 90R « 90L = e, since 
the net effect is the same. 

Generalizing this, if @ is an element of your set S and there is another 
element a’ such that 

axa’ =e=a' *a, 


then we call a’ an inverse of a. 


e The absolute prototype of this is negative numbers. That is, for any 
number n, if you add —n, then you get zero! 


e The same thing happens a lot; for matrix multiplication, the inverse 
matrix would be the operation inverse. 


¢ For rational numbers (not including zero, of course), the reciprocal would 
be the multiplicative inverse. 


But notice that in both of these cases not every mathematical object has an 
inverse with respect to every operation! A matrix with determinant zero does 
not have an inverse matrix. In Q under multiplication, zero has no inverse. 


8.3.2 What is a group? 


Definition 8.3.3 Group. Ifa set and binary operation on that set is closed 
and associative with identity and inverses for every element, we call that set a 
group. © 


Example 8.3.4 The most excellent examples of this are the following: 
e R,Q,Z under addition with zero as identity 


e The sets R and Q except zero (written as R\ {0} and Q\ {0}, respectively) 
under multiplication with 1 as identity 


¢ Z, under addition with [0] as identity. For example, in Z3, every element 
has an inverse; [0] = [0], [1]’ = [2], and [2]’ = [1], because [0] + [0] = 
[0] = [1] + [2]. 


Remark 8.3.5 If we are talking about any old group, we just call it G. 
Also, after a while, it gets boring to always type *, and instead we just use 
normal multiplication notation, writing x * y = xy. 


Example 8.3.6 A preview of what’s to come. We noted that Q \ {0} 
is a group under multiplication, with 1 as the identity. Is there something 
analogous for Z,? 

Indeed there is, and we will see it soon. But notice that things will be more 
complicated. 


¢ For instance, in Z3, both [1] and [2] have multiplicative inverses (in fact, 
themselves), so Z3 \ {[0]} is a (multiplicative) group, just like Q \ {0}. 


¢ But in Z4, both [0] and [2] do not have multiplicative inverses, so it would 
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not make sense to say that Z, \ {[0]} is a group. 


That extra complication is one reason we need to think hard about these things! 


8.3.3 Properties of groups we will need 


The reason for introducing groups in a course which does not presume previous 
exposure to algebra is that is just makes things simpler. We will start here 
with familiar facts in a new guise, and then work our way to some facts which 
will prove invaluable. 


8.3.3.1 Solutions to equations 


Since a group has inverses, we can solve very simple ‘linear’ equations in them. 
This is stated as 


ax x = bis solved by =a’ *b (=a! xb). 


For instance, over R, a+ x = b always has a solution for any real numbers 
a,b. We just take x = (—a)+b, where —a is the inverse for the group operation 
of a (as mentioned above). 

More important to us is the fact that in Z,, there are solutions. The 
operation is still +, so we have a + « = b mod(n) solved by « = ((—a) + b) 
mod(n). 

This doesn’t seem much more interesting, but you will see soon why this 
concept is so important. 


8.3.3.2 Inverses of product 
We can give a formula of sorts for the inverse in any group; see Exercise 8.4.8. 


Fact 8.3.7 The inverse of ab is b-'a7}. 
Proof. First, b~' and a~' exist, so (b~')(a~+) exists. Next, if ab- 2 = 1, then 


(oa \ab)e = Ge") = be 
we use associativity to simplify 
(ta \(abe = (6 )\ (a abe = (07 1b) SH Lie =, 


which gives x = b-ta7!. 
(Keep in mind that in our main example ab- x = 1 is the notion of equality we 
are using in finding and using these inverses.) a 


8.3.3.3 Finite groups 


A group can have finitely many or infinitely many elements. Most of our normal 
ones, such as Z,Q,R, matrix groups, are infinite. 

But the ones we’ll use in this text will mostly have finitely many elements. 
This is because we are counting each equivalence class, like [0], [1], [2] in (mod 
3) arithmetic, as just one element. 

A group with finitely many elements is called, unimaginatively, a finite 
group. 
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8.3.3.4 Order of a group 
Definition 8.3.8 The number of elements of a finite group is called the order 
of the group. 

For any old group G, we use |G| as notation for its order. © 


Example 8.3.9 So if we are talking about Z3, it has 3 elements, so it has 
order 3 (unsurprisingly) and we write |Z3| = 3. 


8.3.3.5 Order of an element 


This is a tougher concept. Suppose you have some element, such as [1] € Z3. If 
you just keep adding [1] to itself, eventually you get back to zero, right? After 
all, 

[1] + [1] + [1] = [0] (mod 3). 


Take a finite group G with order |G| = n. We will bring the concept of 
order to elements, not just groups. 
First, list all elements of the group: 


{6 = 24, Bay- 20 ta} 


Now let’s take an element x, and start operating on it by itself. What I mean 
by this is listing 7, 2*2 = x7, 2°,.... (Don’t be confused by the power notation 
alternating with addition notation; Z,, has two operations, so we keep + there, 
but in a general group we use multiplicative notation.) 

Here is the key. There are only finitely many elements in the group, so by 
x”*? at the latest, at least two of these ‘powers’ will be equal. (This argument, 
that if you fit n + 1 objects into n slots then there must be a repeat, is called 
the pigeonhole principle, among other names.) 

To be concrete, let’s say x° = x*, with s < t. Now we can do a very curious 
thing. Take the inverse of x, written 2~!. If we multiply it together s times, 


—s. 


we get (x ')* which we can write x~*. Then multiply 2° = x¢ by 27°; 


zc °s' =a¢ *2', ore=at8. 
We are almost there! This means there is a positive integer k such that 
a* = e. By the Well-Ordering Principle (Axiom 1.2.1), there is a least such 
integer. This integer, associated to a specific element of the group, is what we 
have been aiming for. 


Definition 8.3.10 For a group element x € G, the least (positive) integer k 
such that «* = e is called the order of the element x. We write it |x|, by 
analogy with the order of a group. © 


Example 8.3.11 For example, in Ze, look at the element [4]. We see that 
[4] + [4] + [4] + [4] + [4] + [4] = [0] mod(6), but [4] + [4] + [4] = [0] mod(6) too. 


So while 6 might look like a possibility for the order of [4], we see that clearly 
3 is actually the smallest (positive) number of times to add [4] to get [0]. So 
\[4]| = 3. 


8.3.3.6 The connection 


Here comes the coolest part, where we connect the two concepts of order. We 
will definitely use Theorem 8.3.12 in proving various theorems. 
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Take a look at any old element x € G. If x has order m, then there are (at 
least) m distinct elements of G, 


HF yt yeh 


Now take any other element not in this subset, y, and look at the set 


{ey,0°y,2°y,...,2°°- ly, ey = y} ; 

Note that these are also all distinct elements of the group. Are any of them 
also included in the first set (powers of x)? 

Suppose that some x*y is the same as some x’. That would mean ay = 2°, 
so multiplying by 2~* we get 

xg ty=e 

That would mean y = 2‘~*, a contradiction since we said y isn’t a power of z. 
Hence the new elements form a disjoint set from the previous set. 

Now find an element z not in either set, and do the same thing. Then the 
set 


m 


fee, 79 9% 2,...,0" 19,68 =} 


will be disjoint from the other sets, and all its elements will still be distinct. 
Since G is finite, eventually doing this process again and again will fill out G 
completely. 


Theorem 8.3.12 Lagrange’s Theorem on Group Order. The order of 
any element x of G divides the order of the group itself. We can write this as 


|x| | |G 
Proof. Examine the above argument. We have a number of subsets of G, all of 
size m, which exactly fill out G, which has size n. This forces that m divides 
n as integers. a 


Example 8.3.13 For example, above we saw that [4] € Ze has order 3, and 
of course Zg itself has order 6. You can check for yourself that 3 divides 6, so 
that |[4]] | [Zel. 

We already had a theorem with Lagrange’s name, but that doesn’t usually 
stop whoever names theorems from giving them names. Lagrange was one of 
the most important mathematicians of the eighteenth century; see Historical 
remark 16.3.7 for more about him. 


8.3.3.7 Cyclic groups 
There is another, simpler concept to keep in mind. 


e If G has order |G| = n and there is some element x € G such that x 
has order |x| = n as well, then it must go through all the possible other 
elements of G before hitting x” = e. 


e This element, whose powers run through all n elements of G, is called a 
generator of the group. 


e Any group that has a generator (again, an element whose powers hit all 
elements of the group) is called a cyclic group. 


It is pretty clear, I hope, that Z, is a cyclic group with generator [1], for 
any n. But not every group is cyclic! See Exercises 8.4.9 and 8.4.10. 
There can be more than one generator; going back to Ze, note that 
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8.3.3.8 Abelian groups 


This won’t come up too much, but it is important to note that most of the 
groups we will encounter in this course have one additional special property. 

Namely, it doesn’t matter what order you do the operation in. (Such an 
operation is called commutative.) 


¢ For instance, clearly (in any Z,,) it is true that [1] + [2] = [2] + [1], or 
really for any elements at all. 


e Not all groups have this property; you may recall that multiplying ma- 
trices in two different orders may yield two different answers. 


e If your group has this property, then it is clear that Fact 8.3.7 reduces 
to. (a0) =e 1b, 


Any group which has this property, that a « b = b «a for all a,b € G, is called 
an Abelian group. Just keep it in mind! 


8.4 Exercises 


1. Write out the addition table for Z1; completely, by hand. 
Write out the multiplication table for Z|; completely, by hand. 


3. Find some conjecture/pattern to state about multiplication tables, based 
on any of the interacts in this chapter. 


4. Find some conjecture/pattern to state about values of a” (mod p), for p 
prime and 0 < n < p you discovered using the interact in Subsection 8.2.1. 
This could be anything profounder than 


a° = 1 (mod p) or 1" = 1 (mod p) 


for all prime p and for all n, but should at least be some pattern you 
tested for a number of values. 
5. Give an example of a non-closed binary operation. 


6. In Example 8.3.2, what is the order of the group element which is rotation 
by ninety degrees to the left? What is the order of rotation by 180 degrees? 


7. Consider a similar setup to Example 8.3.2, but with a regular hexagon. If 
Ris rotation of the hexagon by sixty degrees to the right, verbally describe 
R7-!. How would you describe R® verbally? What is the order of R? 

8. Without using other resources, explain why Fact 8.3.7 is known as the 
“socks and shoes” property. 

9. Give an informal argument that Q (as a group under addition) is not 
cyclic. 

10. Give an example of a cyclic group which is not finite. 

11. (Only if you have some experience with matrices.) Find two 2 x 2 matrices 
A and B which have non-zero determinant such that A-B # B.-A. 
Conclude that the group of 2 x 2 matrices with non-zero determinant is 
not Abelian. (It is a group, because all such matrices have an inverse 
matrix.) 
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Summary: The Group of Integers Modulo n 


In this chapter, it is high time to introduce a few algebraic innovations that 
allow a unified presentation of our ideas about modular arithmetic. 


1. Most importantly, we officially define Integers Modulo n and reconfigure 
what an inverse is in Fact 8.1.5. We not only make tables of operations, 
but in Subsection 8.1.2 we start visualizing them! 


2. We will see later that the visualization of powers in Figure 8.2.1 is ex- 
tremely powerful. 


3. In the final section, we build our way up to the definition of a group in 
Definition 8.3.3, so that in the future we can use the important ideas of 
the Order of an element of a group and Lagrange’s Theorem on Group 
Order. 


The Exercises give a chance to try some algebraic theory we otherwise avoid 
in this course. 
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Chapter 9 


The Group of Units and Euler’s 
Function 


9.1 Groups and Number Systems 


There is a lot that the integers modulo n can teach us. We can approach new 
horizons by rethinking the problems we have just studied. 


9.1.1 Solving linear equations — again 


What is a group, again? As we saw in Section 8.3, a group is any ‘number 
system’ where we can solve linear equations. 


Example 9.1.1 Here are some familiar group examples. 


e The integers modulo n, Z,, is a group under addition. As an example, 
3+ 2 = 2 (mod 4) has a solution. 


Namely, we use the (group) inverse, —3 = 1, to solve it, so that 


wv =2+(-3) =2+1=3 (mod 4) 


is the solution. 


Similarly, we can solve equations like Z -x = 5 over the rational numbers. 
Why? Because } has a (group) inverse in the group Q \ {0} (under 


multiplication), namely (3) a 3, and 


does indeed solve this equation. 


Let us use this idea to help us with solving congruences modulo n. Using 
the above framework, I should be able to solve 


43a = 2 (mod 997) 


by using something like a = 43~!, the notation we saw before. 
That would get us 


x = 2a = 2-437 (mod 997). 


121 
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Let’s try this in Sage. 


a=mod (43 , 997) 

Xx=2ka%-1 

print ("a.isi%s"%a) 

print ("a*-1.isu%s"%a*-1) 
print ("2a*-1_is.%s"%x) 


ais 43 
a*-1 is 371 
2a*-1 is 742 


This checks out, of course: 


mod (43*742 , 997) 


2 


We can similarly try to solve with a composite modulus: 
53y = 29 (mod 100) 
using b = 5371, so that 


y = 29-b= 29-5371! (mod 100). 


y=29xmod (53,100) *-1 
print ("y iis i%’s"%y) 


y is 93 


y=29xmod (53,100) *-1 
53*y 


29 


9.1.2 A new group 
9.1.2.1 The group of units 


So solving this should often be possible. But it can’t always work, otherwise I 
could use it to solve something like 


52y = 29 (mod 100) 


and we already know this does not have a solution. We can’t just use this idea 
willy-nilly; indeed, there isn’t a 527! in this case. 

Hence we introduce a new group — and it’s even a simple set to define. 
Definition 9.1.2 We let U,,, the group of units modulo n, be the set of 
equivalence classes [a] modulo n such that gcd(a,n) = 1. © 

This will be the set where we are allowed to do inverses, and hence to solve 
things easily. Recall Definition 5.3.4 and Question 5.3.6. 


Example 9.1.3 Before going on, figure out for yourself the elements of Us and 
Us. 


Now, naming something doesn’t guarantee it’s useful, or that it performs 
as claimed! So we need to check some things from Definition 8.3.3. 


Proposition 9.1.4 The group of units is really a group. 
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Proof. First, this is certainly a set. Since we earlier proved that any two 
elements of a residue class have the same gcd with a modulus, the definition 
makes sense, and we know how to check if something is in it. 

Next, the set is associative with respect to multiplication, because it’s really 
the same as multiplication over Z. The identity element [1] is likewise inher- 
ited from Z. We have inverses because we only allow elements that will have 
solutions to ax = 1 according to Proposition 5.1.1; see also Question 5.3.6 and 
Exercise 5.6.5. 

Finally, we do need to check whether the multiplication is closed on this set. 
After all, it’s not obvious that if az = 1 and bx = 1 have solutions, then so 
does (ab)x = 1! But if gcd(a,n) and gced(b,n) are both 1, then ab will also be 
coprime to n, which is all that is needed!. All in all, that means U,, really and 
truly is a group. a 


9.1.2.2 More facts and examples 


The terminology units makes sense too. If you are in a number system with 
addition and multiplication, then a unit is an element that has a multiplicative 
inverse. 


Example 9.1.5 Here are some examples of units. 


e In the integers, +1 are the units. 


e More unusual is the set of complex numbers (!), which are all units (except 
zero). In fact, the inverse of r (cos(@) + isin(@)) is 


: (cos(—0) + isin(—@)). 


e And U,, is the set of all the integers modulo n that have multiplicative 
inverses. By our previous investigations, we know this is when ax = 1 
(mod n) has a solution. Since multiplication is the operation, there are 
inverses! 


Naturally, it can take a while to list all the elements of U,,, but it’s worth 
doing. Try it for n = 10, n = 11, and n = 12 by hand (see Exercise 9.6.1). 

Sage has commands to list the group of units and give the order of the 
group. Try them interactively here, or individually below. 


@interact 
def _(n=22): 
pretty_print (html ("The_units_of_$\\mathbb{Z}_{%s}$_ 
are"%n)) 
pretty_print (html ( 
Integers(n).list_of_elements_of_multiplicative_group()) 
) 
pretty_print (html ("There are _$%s$_of_ 
them."%Integers(n).unit_group_order())) 


Sage note 9.1.6 Reminder to try things out. Remember, you can use 
these yourself by using these commands, or by cutting and pasting them in 
a Sage or Jupyter notebook, CoCalc, or command line interface. They are 
tedious to type, though! 


lTry proving this two ways, using tools in Chapter 3 and then those in Chapter 6. 
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Integers (50).list_of_elements_of_multiplicative_group () 


[1, 3, 7, 9, 11, 13, 17, 19, 21, 23, 27, 29, 31, 33, 37, 
39, 41, 43, 47, 49] 


Integers (50) .unit_group_order () 


20 


9.2 The Euler Phi Function 


We give the size of the group of units (mod n) a special name. 


Definition 9.2.1 We give the order of U,, the name ¢(n). That is, by defini- 
tion, 


o(n) = |Unl. 
% 


This is the so-called Euler ¢@ function. It can also be written phi, it is 
pronounced ‘fee’, and it’s occasionally notated y just for fun. We’ll meet Euler 
many times in this text; see Historical remark 13.0.3. 


Remark 9.2.2 Since modulo one everything is one, we say U, = {{1]} and 
(1) = 1 since ged(1,1) = 1, despite the fact that also everything is zero. If 
this bothers you, you are nearly at the algebraic notion of a field mentioned 
toward the end of Section 8.1, and may wish to read some discussions of the 
field with one element?. 

One of the most fun things to do with basic number theory is to explore 
new concepts with pencil and paper — because it really is tractable. 


Question 9.2.3 Do you see any patterns on the value of ¢(n)? 


9.2.1 Euler’s theorem 


So far this is a relatively abstract concept. What follows is not abstract at all, 
but very, very useful! Let’s follow the following argument to see what we can 
find out about $(n). 

Recall the notion of the order of an element (Definition 8.3.10). So any 
random element [a] € U;, (for some n) has an order. 


Example 9.2.4 For instance, the order of [2] in U7 is 3, because [2] and [2]? 
are not 1, but [2]? = 8 =1 (mod 7). 
This means we can apply the things we learned about orders, in particular 
Theorem 8.3.12 of Lagrange. It stated that the order of any element of a finite 
group divides the order of the group itself. 
Think about what this implies for orders in |U,,|. First, |a| divides |U,,]. 
(For instance, in Example 9.2.4, 3 divides 6.) That can be rewritten as 


lal | O(n), or o(n) = klal 


for some positive integer k. 
Finally, let’s apply this fact to powers of a. 


a?) — gklal — (gil) = 1" =1 (mod n) 


2ncatLab.org/nlab/show/field+twithtonetelement 
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This is very interesting; without it, all we would know is that a!*! = 1 because 
that’s the definition of what ‘order’ means. With it, we have proved one of the 
many celebrated theorems of Leonhard Euler: 


Theorem 9.2.5 Euler’s Theorem. /f gcd(a,n) = 1, then a®”) = 1 (mod 
Proof. See the preceding paragraphs. a 

Try verifying Euler’s Theorem for n = 12 and n = 11 for some simple a 
such as a = 3 or a= 5. Can you see how to recover Fermat’s Little Theorem 
from Euler’s Theorem, as a special case? (See Exercise 9.6.2.) 


9.3 Using Euler’s Theorem 


Euler’s Theorem has many uses, especially theoretical ones we will use through- 
out. We will begin with its use in some computations we are already familiar 
with; see Section 10.5 for some more interesting computational uses. 


9.3.1 Inverses 


We can use it to compute inverses mod (n), with just a little cleverness. If 
a?) = 1 (mod n), 
then certainly multiplying both sides by a~! yields 
a?(™)-1 = g-! (mod n). 


We can check this using Sage. 


@interact 
def _(a=3,n=10): 
a=mod(a,n) 


try: 
b = a*-1 
pretty_print (html (r"$%s*{-1}$_ is $%s$_and_ 
$%S*{\phi(%s) -1}=%s*{%s-1}$ isialso $%s$"% Ca, b, 
a, n, a, euler_phi(n), a*(euler_phi(n)-1)))) 
except: 


pretty_print (html ("Don't forget to pick an_$a$_that_ 
actually has _an_inverse.modulo.$n$!")) 


Example 9.3.1 Let’s pick a congruence we wanted to solve earlier, like 
53y = 29 (mod 100) 


and try to solve it this way. Instead of all the stuff we did before, we could 
just multiply both sides by the inverse of 53 in this form. 


53y = 29 (mod 100) 
53°C0)—1 . 53y = 53°C0)-1 . 299 (mod 100) 
Now using Theorem 9.2.5, we get 
1+ y = 29 - 53°09)! (mod 100). 


One could conceivably do this power by hand using our tricks for powers; 
using a computer, it would look like the following in Sage. 
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mod (29*53*(euler_phi (100) -1) ,100) 


93 


This answer jells with our previous calculation. Better, I didn’t have to 
solve a different linear congruence in order to solve my original one; I just had 
to have a way to do multiplication mod (n). 


Sage note 9.3.2 Euler phi in Sage. Notice that Sage has a 
command to get the Euler phi function, namely euler_phi(n). ‘This 
doesn’t have the direct connection to the group, but is easier to use than 
Integers(n).unit_group_order(). 


9.3.2 Using Euler’s theorem with the CRT 


We can use this to do Chinese Remainder Theorem systems much more easily, 
as long as we have access to @. 

Remember the algorithm for the CRT, where we tried to solve systems like 
this: 


° £ =a, (mod n1) 
© £ = a2 (mod ne) 


There, we had to calculate many solutions to congruences of the form 
N 
—«x = 1 (mod nj). 
Ni 


(This was to get the d; numbers.) Our new information means that this inverse 


is just 
N =I _ N o(ni)—1 
NG ~ Ni; , 


since we are looking at a congruence modulo n;. 
So the things in the final solution which looked like 


cle 
ay _— — 
Ny Ny 
can be thought of as 


N “yen (vy 
ay" | = a | ’ 


which is much cooler and simpler! So the answer to the general system is just 


N\ $(ni) 
x aT) (mod NV). 
21,4 2,48 = 1,253 
No Wohi =| 3.657 
N=n_1*n_2*n_ 
print (N) 


CHAPTER 9. THE GROUP OF UNITS AND EULER’S FUNCTION — 127 


210 


print (euler_phi(n_1), euler_phi(n_2), euler_phi(n_3)) 

print (mod(a_1*(N/n_1)*Ceuler_phi(n_1)) + 
a_2*(N/n_2)*(Ceuler_phi(n_2)) + 
a_3*(N/n_3)*Ceuler_phi(n_3)),N)) 


42 6 
206 


Sage note 9.3.3 More complex list comprehension. It’s possible to do 
the previous work more concisely, no matter how many congruences you have, 
if you know a little Python and recall from Sage note 4.6.2 the little something 


called a ‘list comprehension®’. 


A ,>A2,Aa = 1,252 

No W203 = 5,657 

N=n_1*n_2*n_3 

sum([Lmod(ax(N/n)*(Ceuler_phi(n)),N) for (a,n) in 
La sO), 2 5 2) 5 Cas i )) a) 


’ 


206 
But that’s not necessary for our purposes. 
Example 9.3.4 We can do this one step even better. Take a huge system like 
e 3c =7 (mod 10) 
e 2x =5 (mod 9) 
e 4x = 1 (mod 7) 


Can we find solutions for this using the same mechanism? Yes, and without 
too much difficulty now. 
Since one can solve br = c (mod n) with 


a = b?)-1. 6, 
any likely system of congruences with coprime moduli 
bj = c; (mod n;) 
where N is the product of the moduli could be solved by 
k o(ni) 
je N 
— pers) . ‘) — N 
x d ( : c 7 (mod N) 


Let’s use this to solve this system; we print a few intermediate steps. 


Call C22." C ese = 1 7/ onl 

Mole me2 mes = 10977, 

M=m_1*m_2*m_3 

b_1,b_2,b_3 = mod(3,M),mod(2,M),mod(4,M) 

d_1,d_2,d_3 = mod(M/m_1,M),mod(M/m_2 ,M) ,mod(M/m_3 ,M) 
print (b_1,b_2,b_3) 


print (d_1,d_2,d_3) 

print (b_1*(Ceuler_phi(m_1)-1)*c_1*d_1*(euler_phi(m_1)) + 
b_2*(Ceuler_phi (m_2)-1)*c_2*d_2*(euler_phi(m_2)) + 
b_3*(Ceuler_phi (m_3) -1)*c_3*d_3*(euler_phi(m_3))) 


3docs.python.org/3/tutorial/datastructures.html#List-comprehensions 


CHAPTER 9. THE GROUP OF UNITS AND EULER’S FUNCTION — 128 


3.2 4 
63 70 90 
79 


Notice that we make as much stuff modulo M to begin with as possible. 
Even for bigger numbers, asking Sage to first make things modular is a big 
help — it takes essentially no time! 


Example 9.3.5 We can demonstrate this with much larger examples, picking 
essentially random large primes m; to compute with. 


e 3x2 =7 (mod m1) 
e 2¢ =5 (mod m2) 
e 4¢ = 1 (mod m3) 


In the first one, we choose primes in the ten thousands. 


@1,¢.2,¢.3 = 7,5,1 

m_1,m_2,m_3 = random_prime(10000), random_prime (20000) , 
random_prime (30000) 

M=m_1*m_2*m_3 

b_1,b_2,b_3 = mod(3,M),mod(2,M),mod(4,M) 

d_1,d_2,d_3 = mod(M/m_1,M),mod(M/m_2,M),mod(M/m_3 ,M) 

print ("Our primes _arei%s,.%s, and.%s"%(m_1,m_2,m_3)) 

print (b_1*(Ceuler_phi(m_1)-1)*c_1*d_1*(euler_phi(m_1)) + 
b_2*(Ceuler_phi (m_2) -1)*c_2*d_2*(euler_phi(m_2)) + 
b_3*(Ceuler_phi (m_3) -1)*c_3*d_3*(euler_phi(m_3))) 


It’s worth trying to time this — recall that we can use %time for this in 
notebooks, see Sage note 4.2.1. The second example uses primes in the millions 
range. 


Cale C2256. 3: = 17/ 45.,01 

m_1,m_2,m_3 = random_prime(10*8), random_prime(2*10%8), 
random_prime (3*10*8) 

M=m_1*m_2*m_3 

b_1,b_2,b_3 = mod(3,M),mod(2,M),mod(4,M) 

d_1,d_2,d_3 = mod(M/m_1,M),mod(M/m_2,M) ,mod(M/m_3 ,M) 

print ("Our primes are i%s,.%s, and.%s"%(m_1,m_2,m_3)) 

b_1*Ceuler_phi(m_1)-1)*c_1*d_1*(euler_phi(m_1)) + 
b_2*(Ceuler_phi (m_2)-1)*c_2*d_2*(euler_phi(m_2)) + 
b_3*(euler_phi (m_3) -1)*c_3*d_3*(euler_phi (m_3)) 


9.4 Exploring Euler’s Function 


One of the neatest things about 4(n), beyond it being quite useful for things we 
are familiar with (congruences), is that it is a prototype for the many functions 
there are in number theory. So we will look at it in a bit more depth. 

Let’s get some more conjectures about values of ¢(n). Finding patterns is 
fun! 

One pattern we saw is Theorem 9.2.5, that if gcd(a,n) = 1, then at”) = 1 
(mod n). But there are some other places one might look for patterns, now 
that one has done some number theory. These are questions the Fundamental 
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Theorem of Arithmetic just begs us to ask, regarding a possible formula. 


Question 9.4.1 One can ask: 
e Given a prime p, is there a formula for ¢(p°)? 


e If mand n are coprime, is there a relation between 6(mn) and ¢(m) and 


o(n)? 


What happens in the latter case for n = 15 and m = 16? Can you do it by 
hand? 

There are a lot of other interesting questions one can ask about this function 
which aren’t directly related to a formula. 


Question 9.4.2 For instance, one can ask: 
e When does ¢(n) | n? 
e When (if ever) does 6(m) | (mn)? (See Exercise 9.6.18.) 
e Given m, for how many integers n it is true that d(n) = m? 


e Are there infinitely many n for which $(n) ends in zero? (See Exer- 
cise 9.6.17.) 


One can also ask questions about new, related functions. For instance, let 
f(n) = o(n)/n. Can you find a formula? Where is this function equal to 
certain values, such as f(n) = 1/2? (See Exercise Group 9.6.14-16.) 

Quite surprisingly, there is an additive result as well — try adding up 


S> e(d) 
d\n 


for small values of n to seek a pattern! (Try it interactively below.) 


@interact 
def _(n=range_slider(2,150,1,(2,20))): 
top = n[1] 
bottom = n[Q] 
cols = ((top-bottom)//10)+1 
T = [cols*['$n$',r'$\phi(n)$']] 
list = [Li,euler_phi(i)] for i in range(nl0@],nLl1])] 
List.extend((10-(len(list)%10))*L'','']) 
for k in range(1Q0): 
t = [Litem for j in range(cols) for item in 
ListLk+10*j] ] 
T. append(t) 
pretty_print(html(table(T, header_row = True, frame = 
True))) 


Remark 9.4.3 Before moving on to some proofs in the next section, we highly 
encourage all readers to explore many questions — perhaps using the interact 
above. It’s simply not the same to just prove, and even less so to read a someone 
else’s proof. To really understand these (or other) things in mathematics, one 
must get a feel for them “by hand”. 
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9.5 Proofs and Reasons 


In this text, we try to strike a balance between exploration and proof. The 
point is that number theory is both of these things. Exploration is wonderful, 
but we will see a number of times where we really do need the proof to avoid 
error. Nonetheless, do not start this section before really trying things! 

In a good proof, the techniques will not just prove that things are true, but 
lend insight into why they are true. The proofs here have this trait. 


9.5.1 Computing prime powers 


With some effort above, you should have seen a pattern for ¢(p*). Let’s prove 
this. 


Fact 9.5.1 

(p*) = p® — ph = 1 S | p® 
Proof. What we want is the number of positive numbers (!) coprime to p* and 
less than p*®. 
The most important point is that any number which is not coprime to p© must 
share a prime factor with it, which must be p. Likewise, any number divisible 
by p is not coprime to p*, so this is a necessary and sufficient condition. 
Now we just need to count these numbers. But all the numbers less than or 
equal to p® which have a factor of p are just the multiples of p, which occur 
every pth element. Since p® itself is the p°~'th such multiple, there are exactly 
p®—! such integers not coprime to p*°. 
Subtract; there are 


elements which are coprime. a 


9.5.2 Multiplicativity 


The most interesting proof is that of the following fact4 about ¢ applied to 
certain products. Later (Definition 18.1.2) we will see this has proved that ¢ 
is multiplicative. 


Fact 9.5.2 If gcd(m,n) = 1, then (mn) = o(m) - d(n). 
Proof. Take the integers from 1 to mn and arrange them in an array like so — 
nm rows, m columns: 


1 2 3 sas 
m+1 m+ 2 m+ 3 Le 2M 
(n-l1)m+1 (n-—1)m+2 (n-—1)m+3 ... nm 


Notice that only some of the columns contain elements of U;,, namely, the 
columns with km + € where gcd(¢,m) = 1. The others necessarily share non- 
trivial factors with m, so we focus on the ¢(m) columns like this where all 
elements are coprime to m. 

Now within each such column, I claim there are all possible classes in Z,,. Why? 


e Suppose that two elements of the @ column are the same equivalence class. 


4We use a standard proof such as in [E.2.4] or [E.2.1]; it is also possible to use Fact 9.5.4 
and Proposition 23.4.11 as in [E.2.13] or [E.5.1], but for this particular function this strategy 
seems more illuminating. 
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Then km + = k’m-+ é (mod n). 


e In that case we cancel ¢ to get km = k’m (modulo n as always), and we 
can also cancel m, since we already know it is coprime to n. That leads 
to k =k’. (See also Section 9.7.) 


In particular, each class is only represented once in each column. 

That means that each relevant column has exactly $(n) elements in it which 
are coprime to n (though which rows these elements are in will depend upon 
the column). In total we have ¢(m)¢(n) of them! a 


Example 9.5.3 It can be easier to see with an example, say n = 15. Try the 
following interact if you are online. The elements that are units modulo mn 
are marked with exclamation points. 


@interact 
def _(m=(5,[2..10]) ,n=(3,[2..10])): 
T = CL '$(%s1$'%i for i in [1..m]]] 
for k in range(n): 
ic = {E] 
for i in [1+kx*m..m+kxm]: 
if gcd(i,m*n)==1: 
t.append('$%s$_!'%i) 
else: 
t. append('$%s$'%i) 
T. append(t) 
pretty_print(html(table(T, header_row=True, frame=True))) 


Warning! If you pick an m and n which aren’t coprime, you’ll see how the 
exclamation points don’t come in the right amounts or the right places for the 
proof. 

Again, since there are ¢(m) columns with ¢(n) elements in them, all co- 
prime to both m and n, that means there are ¢(m)¢(n) elements coprime to 
mn, which proves what we wanted. 


9.5.3 Addition Formula 


If you were diligent in your exploration, you will have discovered that 
S° (d) =n. 
d\n 


We will prove this carefully, using subsets. We will gain insight of a combina- 
torial nature — that there are two ways to count n, one of which is precisely 
about finding numbers coprime to divisors of n. 

To really understand this proof, it is best to follow along with n = 15. 


Fact 9.5.4 
> ¢(d) =n 
d\n 


Proof. In order to show this, we will take the set {1,2,3,...,n} and partition 
it into subsets of numbers that each have the same gcd with n. If we can show 
there are ¢(d) numbers having each possible gcd, then that totals up to n. 
Indeed, the only possibilities for greatest common divisor with n are the k 
various divisors {d;}*_, of n, so that each subset corresponds to one of these 
divisors. Our subsets then look like 


{ae Z|0<a<n-1, ged(a,n) =1=d;},... 
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{aeZ|0<a<n-1, ged(a,n) = dy}. 


Let’s look at these sets more carefully. Each one consists of numbers sharing 
divisor d; with n. So, if we wanted to, we could divide all the numbers in the 
ith set 

{aeZ|0<a<n-1, gced(a,n) =d;} 
by their common factor d;. 
That new set will be the set of positive numbers b < - also coprime to a 
So the size of the subset of numbers having gcd d; with n is the same as the 
number of these 6 coprime to ai 
More precisely, if we look at ail the original subsets in question, they have the 
same sizes as the following sets: 


{bE Z|1<b<n/1, gcd(b,n/1) =1},... 


{bEZ|1<b< n/n, gced(b,n/n) = ged(b, 1) = 1}. 


These new sets {b € Z| 1 <b < n/d;, gcd(b,n/d;) = 1} themselves are dif- 
ferent from before (and possible not disjoint). But their sizes (or cardinalities) 
are the same as before, and the old sets were all disjoint, so we conclude that 


n= $(n) + o(n/di) + O(n/d2) +--+ + (1). 


But the set of numbers 7- for all divisors d; of n is also the set of all divisors 
of n, so we can rewrite the sum as desired! 


n=S > ¢(d) 
d\n 


B 
Some readers will want to know this will be revisited in a far more sophis- 
ticated way in Example 23.2.4. 


9.5.4 Even more questions 


There are lots of other interesting questions to tackle. Go back to the beginning 
of Section 9.4 and look at some of the questions you didn’t yet explore. You 
now have the tools you need to tackle such questions, and even to prove things 
about them. The structure of ¢ is very regular! 


9.6 Exercises 


Compute the group of units U,, for n = 10,11, 12. 
Prove Theorem 7.5.3 as a corollary of Theorem 9.2.5. 


Prove that if p is prime, then a? = a (mod p) for every integer a. 


eh 


Use Exercise 9.6.3 to prove the polynomial x° — x +2 has no integer roots 

(see Section 4.5 for context). 

5. Formally prove that ¢(p) = p—1 for prime p, by deciding which [a] € 
{[0], [1], (2],--., lp — 2], [> — 1]} have ged(a, p) = 1. 

6. Verify Euler’s Theorem by hand for n = 15 for all relevant a (note that 

#(15) = 8, and remember that a® = ((a?)?)? so we can use modulo reduc- 

tion at each squaring). 


7. Get the inverse of 29 modulo 31, 33, and 34 using Euler’s Theorem. 
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8. Evaluate without a calculator 11*9 (mod 21) and 139'!? (mod 27). 
9. Solve the congruence 332 = 29 (mod 127) and (mod 128). 


10. Solve as many of the systems of congruences we already did Exercises 5.6 
using the Chinese Remainder Theorem and Euler’s Theorem as you need 
in order to understand how it works. Follow the models closely if neces- 
sary. 

11. Use the facts from Section 9.5 to create a general formula for ¢(N)) where 
N= eae p;’. Then prove it by induction. 

12. Conjecture and prove a necessary (or even sufficient) criterion for when 
@(n) is even. (Thanks to Jess Wild.) 


13. Compute the ¢ function evaluated at 1492, 1776, and 2001. 


Exercise Group. Let f(n) = d(n)/n. 
14. Show that f(p") = f(p) if p is prime. 
15. Find the smallest n such that f(n) < 1/5. 
16. Find all n such that f(n) = 1/2. 
17. Prove whether there are infinitely many values of ¢ that end in zero. 
18. Conjecture whether there are any relations between m and n that might 
lead $(m) to divide ¢(n). 
19. Look up the Carmichael conjecture about ¢. What does it say, and what 
is the current status? of this conjecture? 


20. For those with an interest in computer science: Look up the Busy Beaver 
problem for Turing machines. Then read this description of how Euler’s 
Theorem helps prove the immense size of the current record holder® (as 
of this writing). Explain in your own words how it helped, but also what 
additional (number-theoretic) tools were needed. 


9.7 The Conductor, solved 


Do you remember A First Problem from the prologue? Somewhat surpris- 
ingly, perhaps, the same train of ideas from the proof that ¢ is multiplicative 
(Fact 9.5.2) can lead us finally to a nice proof of a formula for the conductor of 
any pair of relatively prime integers m,n. And this will be a concrete formula 
and proof we can actually understand! 


Example 9.7.1 As before, let us take a concrete example, for m = 3,n = 5. 
In Table 9.7.2 the first row indicates that each column is in one of the three 
equivalence classes modulo 3. The ones which can be written max + ny are 
underlined. 


Table 9.7.2 Example of conductor analysis 


= 
i 


IS ke I Io IO 
ljal64 8 
KS |S too lew no 


5Be wary of commercials mentioning it; see the May 2019 Notices of the AMS! (See 
bit. ly/2sNH6B7 and search www.ams.org for ‘drivetime notices’) 
Swww. sLigocki.com/2022/06/21/bb-6-2-t15.html 
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In each column, look at the lowest number that can be represented. Do all of 
these have something in common? You may also want to see any commonalities 
in the numbers which cannot be represented. 

To complement the table, try the following interact if you are online. This 
time elements that do have a representation as mx + ny for nonnegative 2, y 
are indicated with exclamation points, by analogy with Example 9.5.3. 


@interact 
def _(m=(3,[2..10]) ,n=(5,[2..10])): 
them = set([m*x+nxy for x in srange(n) for y in 
srange(m) ]) 
T = CL '<m>[%s]</m>'%i for i in [@..m-1]]] 
for k in range(n): 
t= C] 
for i in [0+k*xm..m-1+kxm]: 
if i in them: 
t.append('%s_.! '%i) 
else: 
t.append('<m>%s</m>'%i) 
T. append(t) 
pretty_print(html(table(T, header_row=True, frame=True))) 


In each column — that is to say, in each residue class modulo m — the lowest 
number that can be represented is a multiple of the other number n. We can 
justify this. If ma-+ny is representable, then since x, y > 0 we can just subtract 
off multiples of m until the number is just a multiple of n, which is obviously 
representable. 

Now let’s consider those multiples of n, but regarded modulo m (so, in 
different columns). Those must all land in different residue classes modulo m, 
presumably not in the same order as the usual order: 


{[0], [rn], [2n], [3n],-.., (7m — 2)n], [(m — 1)n]} = {[0], [A]... [rm — 1}. 


In Example 9.7.1 we obtained 
{[0], [5], [10]} = {[0]s, [2]s, [1]s}- 
To see this’, consider that if 
kn = k'n (mod m) 
then we can just cancel n since gcd(m,n) = 1, and so 
k=k' (mod m). 


Significantly, all numbers in each residue class (modulo m) greater than kn are 
also representable, since they are by definition a multiple of m greater than kn; 
since all residue classes are represented, this means there is a greatest number 
beyond which all are representable (the conductor). 

So what is the conductor? All these multiples of n are representable, so 
the largest numbers which are not representable in each class modulo m must 
be kn —m. The biggest of those is clearly the one with the biggest k, which is 
(m — 1)n, so 

(m—1)n—m 


7Showing this fact was actually part of our proof of Fermat’s Little Theorem, done in 
Exercise 7.7.10, but for completeness we include it now. 
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is the biggest number that can’t be represented. 
Alternately, we can write 


(m—1)n—m+1=mn-—n-—m+1=(m-—1)(n—-1) 


for the smallest number above which all are represented, and we have a formula 
for the conductor. 

We can summarize the entire discussion above as a proof of the first half of 
a solution® to Exercise 1.4.7. 


Fact 9.7.3 Given m,n positive coprime integers, the conductor exists and is 
(m—1)(n—1). Exactly half the integers less than the conductor cannot be 
represented (and so half of them can be). 

Proof. See above for the formula for the conductor. Then we want to show 
that exactly half of the numbers below the conductor, including the (obviously) 
representable 0, are in fact representable. We will do this by pairing up the 
numbers from 0 to (m— 1)n — m in a way such that each pair adds up to 
(m—1)n—m. One of each pair will be representable, yielding the conclusion. 
(That the numbers arrange in pairs follows from noting that since gcd(m,n) = 
1, at least one of m,n is odd, so there are an even number of integers from 0 
to (m—1)n—™m.) 

Suppose that 0 < z < (m—1)n—™m is representable, so that 


z=mrzt+ny,m,n> 0. 
Then consider the ‘complement’ 


z =(m—-—1)n—m—z=m(-1—-2)4+n(m—1-y). 


(In Example 9.7.1 we could consider z = 5 and z’ = 2, where « = 0,y = 1.) 
Since x,y > 0, certainly (—1 — x) < 0. So it sure looks like 2’ = m(—1— x) + 
n(m — 1 —y) is not representable. Of course, it’s possible that z’ could be 
written in some other representable way by adding some ms and subtracting 
some ns. 

However, because m and n are coprime (think back to our methods in Sec- 
tion 3.1), the minimum possible number of each of these needed to do this 
would be n added ms, and m subtracted ns. Then we could write 


z =m(n—1—2)+n(-1-y), 


but this has the problem now that —1—y < 0. Since any other representations 
would need even fewer ms than the first representation, or even fewer ns than 
the second, there isn’t any way to do so with x, y nonnegative. 

Finally, we can invert this argument to ensure this is a one-to-one correspon- 
dence between representable numbers and the rest. By Definition 2.4.1, any 
positive number z’ which is not representable must still have a representation 
as maz + ny, just that either « or y (but not both) are positive. Pick the 
representation 2’ = mx + ny with the smallest positive « (which exists by Ax- 
iom 1.2.1). Then z’ can also be written as m(a — n) + n(y +m) where now 
x—n <0, since x was the smallest positive option for the coefficient of m. 
Since 2’ > 0, then y+ m > 0. 

We can rewrite this as 0 < —y < mand 0 < a < n. Then the ‘complement’ 
can be represented as follows, where in the last line we add mn to the first 
term and subtract it from the second term: 


z=(m—1)n—m—2 =(m—-1)n—m-—[mze+ny] 


8See [E.2.1, Exercise 1.25, solution] for an argument based on the geometric ideas we 
explored in Section 3.3. 
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=m(-1l-—2)+n(m-—1-y) 


=m(-1l—-—2)+mn+n(m—1-—y)—mn 
=m(n—1—2)+n(-1—-y). 
Since x —n < 0, we know n > « which means n — x — 1 > 0; since similarly 


—y > 0 we know that —1— y > 0, so z really is representable, and we have 
completed the proof. a 


Summary: The Group of Units and Euler’s Function 


This chapter uses the groups viewpoint of Chapter 8 to introduce the im- 
portant topic of units. 


1. After an example revisiting solving linear congruences, we introduce The 
group of units in Definition 9.1.2. Yes, we check it is a group. 


2. In Definition 9.2.1 the Euler ¢ function is introduced, along with the 
incredibly important Euler’s Theorem about powers of a number modulo 
n. 


3. We then use Euler’s Theorem in Section 9.3 to do computations with 
Inverses and the Chinese Remainder Theorem. 


4. Explore! In Section 9.4 you are encouraged to think about not just a 
formula for ¢, but more sophisticated properties thereof. 


5. In the last major section of this chapter, we then prove the most impor- 
tant formulas. 
e In Fact 9.5.1 we get a formula for ¢ (p°). 


e In Fact 9.5.2 we see that ¢ is multiplicative, which should allow for 
a general formula in Exercise 9.6.11. 


e In Fact 9.5.4 there is a remarkable addition formula. 
There are many computational Exercises, and we especially encourage trying 


to explore enough to make conjectures like in Exercise 9.6.18. Finally, in Sec- 
tion 9.7 we finally solve the questions originally raised in Question 1.1.1. 


Chapter 10 


Primitive Roots 


There is deeper structure in the group of units than one might at first suspect. 
This chapter explores that structure. 

To start off, remember our search for patterns in the powers of a (mod n)? 
That is, we looked for patterns in a? mod(n). One of the things we discovered 
was Fermat’s Little Theorem, which was that the first and last columns of the 
following graphic were the same color (representing one). 


O 2 4 6 8 10 


Figure 10.0.1 Colored table of powers modulo n = 11 


There is lots left to discover, though. Can you find more by using the 
following interact? 


import matplotlib.pyplot as plt 
from matplotlib.ticker import IndexLocator, FuncFormatter 
@interact 
def power_table_plot (p=(13, prime_range(5,50))): 
mycmap = plt.get_cmap('gist_earth',p-1) 
myloc = IndexLocator(floor(p/5) ,.5) 
myform = FuncFormatter(lambda x,y: int(xt+1)) 
cbaropts = { 'ticks':myloc, 'drawedges':True, 
"boundaries':srange(.5,pt+.5,1)} 
P=matrix_plot(matrix(p-1,[mod(a,p)*b for a in range(1,p) 
for b in srange(p)]), cmap=mycmap, colorbar=True, 
colorbar_options=cbaropts, ticks=[myloc,myloc], 
tick_formatter=[None,myform]) 
show(P, figsize=6) 
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Sage note 10.0.2 Reminder for colormaps. Remember, to get a gray- 
scale plot, just change the part with plt.get_cmap('gist_earth',...}) to use 
'gray', or some other colormap (see Sage note 8.2.2) of your choice. 

Have you made the observation that sometimes we get all colors in a single 
row? This means that (at least sometimes) a? (mod n) goes through every 
single number when we do enough powers a’. 

It turns out that this concept has a name, and is the last of the big concepts 


of basic congruence number theory. 


10.1 Primitive Roots 


10.1.1 Definition 


Definition 10.1.1 We say that a € U,, is a primitive root of n when a? runs 
through all elements of U,, for 1 <b < d(n). © 

Or, you can say the row corresponding to a primitive root hits all the 
possible colors in the visualization! For composite n, this won’t mean all colors 
per se, just all colors that represent units. (See the colorbar below.) So for 
such moduli, we shrink the number of rows down for this visualization; it has 
rows only for the elements of U;. 

By the way, can you ‘see’ Euler’s Theorem in this graphic? (Don’t forget 
that it generalizes Fermat’s Little Theorem.) Try exploring it in the interact 
as well, which allows not just for prime moduli but composite ones as well. 


0 1 2 3 4 


Figure 10.1.2 Colored table of powers (of units) modulo n = 10 


import matplotlib.pyplot as plt 
from matplotlib.ticker import IndexLocator, FuncFormatter 
@interact 
def power_table_plot (modulus=(10, srange(3,50))): 
Zm = Integers(modulus) 
ls = Zm. List_of_elements_of_multiplicative_group() 
mycmap = plt.get_cmap('jet',modulus-1) 
myloccb = IndexLocator (ceil (modulus/1Q0) ,.5) 
myloc = myloccb 
myform = FuncFormatter(lambda x,y: 
LsLmin(int(x),len(ls)-1)]) 
cbaropts = { 'ticks':myloccb, 'drawedges':True, 
"boundaries':srange(.5,modulust+.5,1)} 
P=matrix_plot (matrix (euler_phi (modulus), 
[mod(a,modulus)*b for a in range(1,modulus) for b in 
srange(euler_phi(modulus)+1) if gcd(a,modulus)==1]), 
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cmap=mycmap, colorbar=True, 
colorbar_options=cbaropts, ticks=[None,myloc], 
tick_formatter=[None ,myform]) 

show(P, figsize=6) 


Sage note 10.1.3 Filtering list comprehensions. We are only looking 
at units here. Where does this show up in the code? The syntax [x for y 
in range(1,mod) if func(x)] takes list comprehensions to another level, by 
‘filtering’. This allows us to remove from the list anything which doesn’t fit what 
we want. In this case, we removed non-units; gcd(a,mod)==1 was required. 


10.1.2 Two characterizations 


Proposition 10.1.4 There are two equivalent ways to characterize/define a 
primitive root of n among numbers such that gcd(a,n) = 1. 


¢ We say that a is a primitive root of n if a? yields every element of Un. 


e We say that a is a primitive root of n if the order of a is d(n). 

Proof. Why are these true? Recalling the terminology from Section 8.3, the 
first one means that U,, is a cyclic group (one all of whose elements are powers 
of a specific element), and that a is a generator of that group. This is the 
more advanced point of view. 

The second point of view also uses the group idea of the order of an element. 
Remember, this is the smallest number of times you can multiply something 
by itself and get 1 as a result. What would this idea mean without using the 
terminology of groups? With that viewpoint, k is the order of a if a* = 1 (mod 
n) and a? #1 forl<b<k. B 


10.1.3 Finding primitive roots 


As a first exercise, the gentle reader should figure out the orders of some 
elements of some small groups of units. For n € {5,7,8,9, 10, 12,14, 15}, try 
exploring U,,. There should be at least some primitive roots. 


Question 10.1.5 In exploring U,, for some n € {5,7,8,9, 10, 12, 14, 15}: 
e Were all elements primitive roots? 
e Did all of these groups have primitive roots? 


e Is it particularly fun to look for them? 


It’s useful to try looking for primitive roots by hand. However, it’s better 
to know whether one should bother to look, and hence to try to prove things 
about orders in general. 


10.2 A Better Way to Primitive Roots 


10.2.1 A useful lemma 


In order to find primitive roots, we might want a better approach than simply 
trying every single power of a for every a until we find one. Let’s walk through 
an example to motivate a new approach, using a small modulus. 
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Example 10.2.1 A motivating example. Let’s take a number n such that 
¢(n) has some, but not too many, factors — say, n = 11, (11) = 10. Okay, we 
know that every element a € Uj, will have 


a! = 1 (mod 11), 


but which elements don’t reach the unit before the tenth power? 

We know by Theorem 8.3.12 that the order of an element has to divide 
#(11) = 10, so we could try a? and a®; no other a* could yield 1. In fact, if 
those aren’t = 1, there aren’t any other possible orders out there, so that a 
would work as a primitive root. 


e Let’s try this with a = 2. 
2? =4#1 (mod 11) and 2° = 32 =—1# 1 (mod 11), 
so 2 must be a primitive root. 
e What about with a = 3? 
3° =9 #1 (mod 11), 


so that seems promising, but 


3° = 9-9-3 = (—2)? -3=12=1 (mod 11) 
so 3 cannot be a primitive root modulo eleven (and in fact has order five). 


The moral is that we didn’t have to check all ten possible powers of a = 2 
or a = 3 to decide whether a was a primitive root modulo eleven. If you aren’t 
confident of this idea, try using this strategy to determine which of a = 4,5,6 
is a primitive root (exactly one of them is). 


Now we formalize and rephrase our strategy slightly more efficiently. 


Sage note 10.2.2 How Sage does primitive roots. As far as I understand, 
the following strategy is how even Sage tests for finding primitive roots, at 
least for basic cases. You can check for yourself by looking at the code from 
the component program, PARI!; look for is_gener_expo and is_gener_Fp. 


Lemma 10.2.3 Testing for Primitive Roots. An element a € U;, is a 
primitive root if and only if 


a%(™)/4I £1 in Un for each prime q| d(n). 
Proof. If a is in fact a primitive root, then ¢(n) is the smallest number k such 
that a* = 1, so certainly for numbers smaller than ¢(n), like ¢(n)/q, those 
powers shouldn’t be = 1. 
On the other hand, if a isn’t a primitive root, then its order k must be a proper 
divisor of ¢(n). 
Now look at the prime divisors q of ¢(n)/k. For such a divisor, 


q| ¢(n)/k so gkl = ¢(n) for some ¢ € Z. 


That means ¢(n)/q = ké and so the power ¢(n)/q in the statement is actually 
a multiple of the order k. Since a* = 1, then certainly 


ak — qg")/4 = 1 (mod n) 


as well, which completes the proof. | 


lpari.math.u-bordeaux. fr/cgi-bin/gitweb.cgi?p=pari.git;a=blob; f=src/ 
basemath/arith1.c 
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This proof is a little terse, so let’s unpack this test. Essentially, we change 
two things from the initial idea of trying all divisors of ¢(n): 
¢ Instead of trying powers which are divisors of ¢(n), we try powers which 
are ¢(n) divided by divisors. So 2° becomes 2!°/? and 3? becomes 3!°/5, 
That seems like it’s not doing anything other than rewriting, but at least 
it organizes things differently. 


e Then, instead of having to try all ¢(n)/d, we use a trick to just need 
prime divisors d. (See the proof.) 


Doing some examples slowly will help it make sense. Once you have done so, 
try the interact. 


@interact 
def _(n=(19,[2..100]) ,a=3): 
phi=euler_phi(n) 
pds=prime_divisors (phi) 
if gcd(a,n)!=1: 
pretty_print (html ("Make_sure_$a$_and_$n$ are. 
relatively prime!")) 
else: 
a = mod(a,n) 
pretty_print(html( "Is i$%s$_a primitive _root of. 
$%s$?"%(a,n))) 
pretty_print (html (r"The_prime divisors of. 
$\phi(%s)=%s$_are_$%s$"%(n, euler_phi(n), 
','. join([str(pd) for pd in pds])))) 
pretty_print (html ("The_powers,_are_"+!' and, 
' join([r'$%s*{%s/%s}\ equiv, 
%s$'%(a,phi,pd,a*(phi/pd)) for pd in pds]))) 
pretty_print (html ("And _the_order_ofia=$%s$ _is_ 
<tt>a.multiplicative_order ()</tt>=$%s$"%( a , 
a.multiplicative_order()))) 


10.2.2 Using the test lemma 


If you tried various n and various attempts at primitive roots a above, you 
will see that Lemma 10.2.3 really works. Make sure you are trying a that are 
actually coprime to n, though! As it turns out, there aren’t very many test 
powers to try, since ¢(n) in general doesn’t have a lot of prime divisors, even 
if n is a fairly large prime. 

Why not try it by hand for n = 17? There is only one prime divisor of 6(17), 
which makes things easier. Fill in Table 10.2.4, where PR means primitive root. 


Table 10.2.4 Check which numbers are primitive roots 


a |f{|1l 2 3 4 5 6 7 8 9 10 11 12 138 #14 «15 += «16 
PR? | No No 
The lemma also makes easy some statements that would otherwise be quite 
hard. For instance, you should (Exercise 10.6.2) see how to use the test lemma 
to prove that if a is a primitive root of n, then so is a~! (modulo n). 

Here’s something harder, to show the power of this approach. 
Proposition 10.2.5 If a is a primitive root of n, then so isn—a if 4| d(n). 
Proof. Let’s think in terms of powers. If a%(")/4 # 1, then 

C= a)?(r)/4 = (—a)?()/4 = (=P ages, 
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So, as long as ¢(n)/q is even for all prime divisors of (n), the two powers (the 

one of a and the one of n — a) come to the same thing. 

Since ¢(n) is already assumed to be even, the only possible odd $(n)/q comes 

from q = 2, but ¢(n) is assumed to be divisible by four, so ¢(n)/q will be even. 
a 


Example 10.2.6 If you did the table at the beginning of this subsection 
properly, you will note that 3 and 14 are a pair of primitive roots of seventeen. 
There should be three other such pairs. 

On the other hand, from the proof we can see that if d(n) is even, but not 
divisible by four, then we expect that if a is a PR then n—a will not be. For 
example, since two is a primitive root of eleven in Example 10.2.1, we expect 
that nine is not; try computing this yourself. 


10.3 When Does a Primitive Root Exist? 


Recall your experimentation in Subsection 10.1.3. You should have discovered 
that there is not always a primitive root. 


Fact 10.3.1 There is no primitive root for n = 12. 

Proof. See Exercise 10.6.4. | 
This is also the case for n = 8 (Exercise 10.6.3). So, when do we have 

primitive roots? 


10.3.1 Primitive roots of powers of two 


We'll start this investigation by proving that most powers of 2 do not have 
primitive roots. The following should give you an error. 


power =25 
primitive_root (2% power) 


Traceback (most recent call last): 
ValueError: no primitive root 


Proposition 10.3.2 For k > 2, there are no elements of Ugx that have order 
o(2*) = 2*-1, because the highest order they can have is 2*~?. 
Proof. Assume n = 2" for k > 2. (For n = 2 and n = 4, there are primitive 
roots — check this if you haven’t already). In Exercise 10.6.3 we show that 
n = 8 does not have a primitive root. In particular, each element of Ug has 
order 23-? = 2, so that a? = 1 (mod 8) for all a € Ug. 
Think of n = 8 = 2° as a base case for induction on k > 3. Now assume by 
induction that for n = 2* it is true that no element has order higher than 2*~?. 
Le., 

eg =1 (mod 2*). 


By definition of divisibility, that means for any odd number a, we have that 
a = 19? van, 


for some integer m. 
Next, let’s look at what happens to everything in modulus 2*+!. We want that 


g(k+1)—2 gk-1 
a 


=a = 1 (mod 2**"), 
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While it’s easy to get 2+! from 2*, the only way to easily get a2" from a2" 


is by squaring. (Recall Fact 4.5.5 where we found powers quickly by using 
e e+1 
(ese 4) 


So we write a? as a square, substitute the above, and look at the remainders. 


gk-1 


_ 2 
: _ (a? _ _ (1+2'm)? = 1+ k+l 4+ 22% ry? 


=1+4+2""1 (m+ 2*-1m?*) = 1 mod 2*** 


By induction we are done; because the highest possible order of an element is 
less than ¢, there are no primitive roots modulo 2* for k > 2. (Remember by 
Lagrange’s Theorem on Group Order in any case the order is a power of two.) 

| 


primitive_root (64) 


Traceback (most recent call last): 


ValueError: no primitive root 


Fact 10.3.3 It turns out that +5 have order 2*~? in Uns. 
Proof. We won’t prove this, but it is easy if you use just a little group theory. 
| 
One can also demonstrate this fact computationally for a given example. 


@interact 
def _(power=5): 
a = mod(5,2*power) 
pretty_print (html ("Powers _of_¥5..modulo_$2*%{%s}$_ 
are"%power)) 
print(La*i for i in [1..2*(power-1)]]) 


10.3.2 Two important lemmas 


There follow two important lemmas? about order in the group of units used 
for working with primitive roots, whose proofs are valuable exercises. 


10.3.2.1 How the lemmas work 


Lemma 10.3.4 Suppose p is prime and the (multiplicative) order of a modulo 
p isd. If b and d are coprime, then a? also has order d modulo p. 
Proof. See Exercise 10.6.6. | 


Lemma 10.3.5 Suppose p is prime and d divides p—1 (and hence is a possible 

order of an element of U,). There are at most 6(d) incongruent integers modulo 

p which have multiplicative order d modulo p. 

Proof. See Exercise 10.6.7. | 
Before using them a lot, we should unpack these results a little bit. Here 

is a first taste. 

Fact 10.3.6 If there is one primitive root of n, then there are actually (¢(n)) 

of them. 

Proof. We will only deal with the case of n = p prime (see Exercise 10.6.10 for 

the rest). 


?Or lemmata, but who’s counting? 
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In Lemma 10.3.4, let the order of a be p—1. Then a is a primitive root modulo 
p, and so is a? for every b coprime to p— 1. Since there are ¢(p — 1) of these, 
it satisfies the claim. By the Lemma 10.3.5, there can’t be more. a 


It works; let’s check this out interactively. 


@interact 
def _(p=(41, prime_range(100))): 
a=mod(primitive_root(p),p) 
pretty_print(html("$%s$_is a .primitive_root of $%s$,_ 
with _order_$%s$"%(a,p,p-1))) 
L=[(i,a*i,(a*i).multiplicative_order()) for i in 
range(2,p-1) if gcd(i,p-1)==1] 
for item in L: 
pretty_print (html (r"$%s*{%s}\ equiv _%s$_also_has. 
order_$%s$_(and.$\gcd(%s ,%s)=1$)"%(a, item[Q], 
item[1], item[2], item[@], p-1))) 


10.3.2.2 How the lemmas (don’t) fail 


To continue, let’s pick a non-prime number we know something about to see 
how many numbers we have with a given order. 

We saw in Proposition 10.3.2 that powers of two (past 4) do not have 
primitive roots, but Up. does have lots of elements with the next smallest 
possible order. So, for example, for n = 32 we can look at whether powers b 
coprime to that order (8) of such an element are in fact also elements with the 
same order. 


@interact 
def _(n=5): 
pretty_print (html ("Modulo_$2*%s"%n) ) 
a=mod(5,2*n) 
L=[(i,a*i,(a*i).multiplicative_order()) for i in 
range(1,a.multiplicative_order()) if 
gcd(i,a.multiplicative_order ())==1] 
for item in L: 
pretty_print (html (r"$%s*{%s }\ equiv. %s$_ has _order,, 
$%s$_(Cand_$\gcd(%s ,%s)=1$)"%(a, item], 
item[1], item[2], item[Q], 
a.multiplicative_order()))) 


The interact confirms that this is true; in fact Lemma 10.3.4 should be true 
whether p is prime or not, though I won’t ask you to prove it. 

Lemma 10.3.5 also seems to be working; there are exactly ¢(8) = 4 powers 
here, each of which has order eight. The problem in deciding if there are 
primitive roots, though, is that there might be another element of the same 
non-maximal order as the powers of a above which is not one of them! This 
code shows them for powers of two. 


@interact 
def _(n=5): 
pretty_print (html ("Modulo_$2*%s"%n) ) 
a=mod(-5,2*n) 
L=[(i,a*i,(a*i).multiplicative_order()) for i in 
range(1,a.multiplicative_order()) if 
gcd(i,a.multiplicative_order() )==1] 
for item in L: 
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pretty_print(html(r"$%s*{%s }\ equiv_%s$_ has order, 
$%s$_Cand_$\gcd(%s ,%s)=1$)"%(a, item[le], 
item[1], item[2], item[Q], 
a.multiplicative_order()))) 


We see that in some sense there are ‘extra’ elements with order 8 when 
n = 32 (confirming Fact 10.3.3 for this n). If you have eight elements of order 
eight, and obviously at least one element of order 1, in U32, then it is impossible 
to have the required eight elements of order sixteen that one would need for 
there to be a primitive root modulo 32. (Why? Because 8+1+8 > 16 = |U39|.) 
In essence, the fact that this can’t happen for a prime modulus is why primitive 
roots do exist in that case. 


10.4 Prime Numbers Have Primitive Roots 


We use many of the same techniques and ideas in by proving that every prime 
number p has a primitive root. Let’s check that this claim is true for at least 
some primes. 


L=[(p,primitive_root(p)) for p in prime_range (100) ] 
for item in L: 
print ("A primitive_root_of_ %s isi %s"%(item[Q],item[1])) 


A primitive root of 2 is 1 
A primitive root of 3 is 2 
A primitive root of 5 is 2 


A primitive root of 97 is 5 
So at least we get a primitive root for the first 25 primes. 


Theorem 10.4.1 Primitive Roots Exist for Primes. Every prime has a 
primitive root. In other words, the order p—1 group U, is always cyclic. 
Proof. Below, we will actually prove the stronger Claim 10.4.4, which states 
that the number of elements of order d (a positive divisor of n) is ¢(d). Natu- 
rally this will be non-zero for d = p— 1, which proves the theorem. 

Before we examine the claim, we need some discussion. 


Example 10.4.2 First, it is useful to see what these sets look like for two 
examples — one where we know we have a primitive root, and one where we 
know we don’t. 

Assuming you are online, evaluate the next cell to get the list of sets of 
different order elements for n = 41: 


for d in divisors (40): 
L=[] 
for a in range(1,41): 
if mod(a,41).multiplicative_order ()==d: 
L.append(a) 
pretty_print (html (r"There_are_$%s=\phi (%s)$.elements_of., 
order _$%s$_-."%(len(L) ,d,d)+str(L))) 


But here is the list of sets for n = 32; there aren’t any for the highest 
possible order, and all the other sets have orders exact multiples of ¢(d). 
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for d in divisors(euler_phi (32)): 
L=[] 
for a in range(1,32): 
if mod(a,2)==1 and 
mod(a,32).multiplicative_order ()==d: 
L.append(a) 
if len(L)==euler_phi(d): 
pretty_print (html (r"There_are_$%s=\phi(%s)$.elements. 
of order $%s$_-_"%(len(L) ,d,d)+str(L))) 
else: 
pretty_print (html (r"There_are_$%s\neq\phi (%s) $. 
elements of _order_$%s$_-_"%(len(L) ,d,d)+str(L))) 


As always, doing an entire example manually is very instructive too. 


For another set of ideas, recall that if g is a primitive root of p, by definition 
g?—! = 1 but no previous positive power is. Assuming p is an odd prime, then 
p—1is even, and we could try to separate out the odd and even powers 


RY ere Ca my eee 
and compare them or their products. 


Question 10.4.3 Let g be a primitive root of p. 


e Can you see why the inverse of an even power of a primitive root is also 
an even power? 


¢ Do you think an odd power (greater than one) of a primitive root g could 
be a different primitive root g’? Why or why not? What about even 
powers of a given primitive root — could they be primitive roots, at least 
in principle? 


Finally, for those with more experience with groups, a good exercise would 
be to see whether Claim 10.4.4 converts into a statement about the number of 
elements of each order of any cyclic group. 

Now let’s prove our claim. 


Claim 10.4.4 If p is prime, the number of elements of U, of order d is $(d) 
(where of necessity d is a positive divisor of ¢(p) = p— 1). 

Proof. Assume that p is prime. For any of the divisors d of p—1 (not just p—1 
itself), consider the possible number of elements of U, with that order, 


|{a € U, | a has order d}|. 


By Lemma 10.3.5, this quantity is clearly between zero and ¢(d). On the other 
hand, by Lemma 10.3.4, once we find one a with order d, then all the powers 
of a coprime to d also have that order (and are distinct), so there are at least 
¢(d) of them. 

In particular, the cardinality of the set of elements of U, of order d is always 
either zero or ¢(d) > 0, so the entire proof boils down to finding at least one 
element a with order d for each potential order d. (The reason we just need to 
consider d | p— 1 is Theorem 8.3.12 that the order of any element of a group 
divides the order of the group, so the only possible orders of elements in U, 
are positive divisors of p — 1.) 

Suppose that at least one of the sets for some divisor d’ (such as the set of 
primitive roots, if d' = p—1) is empty. Then on the one hand, every element 
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of U, has some order, so 


p-1= © |{a€U, | ahas order d}|<0+ > ¢(d). 


d|p—1 d|p—1,d4d’ 


On the other hand, Fact 9.5.4 with n = p—1 tells us that 


YS ad < YO o@=p-1. 


d|p—1,d4d’ d|p—1 


Combining these two inequalities yields p — 1 < p—1, an absurdity. | 

The proof above makes it evident that the real place primality is used is in 
the crucial lemmas 10.3.5 and 10.3.4. If you are still curious to see how this 
works, you can explore more online in the following interact; when there is not 
a primitive root, somehow the ‘extra’ elements of U,, which ‘would have’ had 
order ¢(n) are distributed nicely among the remaining potential orders. 


@interact 
def _(n=(25,[0..100])): 
for d in divisors(euler_phi(n)): 
L=C] 
for a in range(1,n): 
if gcd(a,n)==1 and 
mod(a,n).multiplicative_order ()==d: 
L.append(a) 
if lLen(L)==euler_phi(d): 
pretty_print (html (r"There_are_$%s=\ phi (%s) $_ 
elements _of_order_$%s$_-. 
"%(len(L),d,d)+str(L))) 
else: 
pretty_print(html(r"There_are_$%s\neq\phi (%s) $_ 
elements _of_order_$%s$_-_ 
"%(len(L),d,d)+str(L))) 


10.5 A Practical Use of Primitive Roots 


We will soon begin talking about cryptography and related matters. Before 
we do so, we will preview our computational needs by using primitive roots to 
solve some congruences in a cool way. 

Suppose you want to solve a more involved congruence than the basic ones 
we have tackled thus far. A general form that we might want to solve would 
look like 

a’ = c (mod n) 
where either a or b might be a variable, and n would be prime or a prime power. 
Here are two examples: 


e x? =5 (mod 17) 
e 5° =17 (mod 19) 


You can think of the first one as finding a higher root modulo n, and the second 
one as finding a logarithm modulo n. 

As we will see below, our general strategy will be to find a primitive root 
g of n (when this is possible) and write both as powers of g, e.g. a = g’ and 
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c=? for some i,j € Z. Then our congruence will become 


g° = (mod n) 


and thinking of it as solving in the exponents 7b and 7 will be productive. 


10.5.1 Finding a higher root 


With that as introduction, let’s examine one way to solve the first congruence 
using this idea. 

First, find a primitive root modulo 17. Obviously we could just ask Sage 
and its builtin command primitive_root, or use Lemma 10.2.3 with trial and 
error. In the not too distant past, the back of every number theory text had a 
table of primitive roots! 


primitive_root (17) 


Now what we will do is try to represent both sides of 
x® =5 (mod 17) 


as powers of that primitive root. 

The easy part is representing x°; we just say that 2 = 3° for some (as yet 
unknown) ?, so 

= (3°) =, 

The harder part is figuring out what power of 3 gives 5. Again, there is no 
shortcut, though number theory texts in the past had huge tables of them, and 
their powers (for easy reference). In practice, one would have all powers of a 
given primitive root available for use ahead of time. 


a=mod (3,17) 
L=[(i,a*i) for i in range(2,17)] 
for item in L: 
if item[1]!=5: 
pretty_print (html (r"$%s*{%s}\ equiv_%s\not\equivi 
5$"%(a,item[Q],item[1]))) 
else: 
pretty_print (html (r"$%s*{%s }\ equiv %s$i-u 
hooray! "%(a,item[0],item[1]))) 
break 


By substituting the primitive roots in for 2° and 5, we transform 
x® = 5 (mod 17) 
into the congruence 
3°’ = 3° (mod 17). 


This is a much more familiar type of problem. How would we have solved 
this in high school? You would solve it this way, with equations (not congru- 
ences): 


So 3° S31 Sb 1 = 5/3. 


We will try to do something very similar here. 
What is very important is that this congruence is, in some sense, really no 
longer a congruence in Z,7. To be precise, everything in sight is really in Uj7, 
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a cyclic group of order ¢(17) = 16. But a cyclic group of order 16 would just 
be the same as thinking modulo sixteen! So we can take out the exponents, 
just like in precalculus, but do things (mod 16): 


37 = 5 (mod 16). 


(See Exercise 10.6.14 to justify doing this manipulation.) 

A little guess and check (or more powerful methods earlier in the book) 
show that i = 7 suffices, so that x = 3’ = 11 (mod 17) is the solution. And we 
figured it out without taking every cube out there! 

Indeed, doing just that confirms our result. We take all cubes starting at 
2, and the one corresponding to 11 is what we want: 


[mod(i,17)*3 for i in range(2,17)] 


[8, 10, 13, 6, 12, 3, 2, 15, 14, 5, 11, 4, 7, 9, 16] 


Note the use of range from Sage note 2.1.3. Why do you think we used it 
here? 


Example 10.5.1 If we change the congruence to a fourth power a* = 5 (mod 
17), the only change is that now we have to solve 44 = 5 (mod 16). However, 
there are no such solutions since gcd(4,16) = 4 { 5, and we confirm this by 
seeing that 5 does not show up in this list: 


[mod(i,17)*4 for i in range(2,17)] 


Example 10.5.2 Finally, let’s try solving the closely related x? = 7 (mod 19). 
Here, a primitive root is 2, and it turns out that 2° = 7, so we may attempt a 
solution. We obtain 


2°’ = 2° (mod 19) > 34 = 6 (mod 18), 


which definitely does have solutions. 
In fact, there are three solutions (2, 8,14) to the reduced congruence 


i = 2 (mod 6) 
so there are three solutions (27, 2°, 2!+) to the original congruence. Let’s check 
this: 


a = mod(2,19) 
[(a*b)*3 for b in [2, 8, 14]] 


A similar strategy can work for higher degree congruences. (See [E.2.4, The- 
orem 8.17] for a general statement on when such solutions exist, which we will 
omit for the sake of space.) 


Example 10.5.3 If we try solving 2° = 8 (mod 49), we'll need a primitive 
root of 49; 3 works. I can find out what power 3° of 3 yields 8: 
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x = mod(primitive_root (49) ,49) 
L=L(i,x*i) for i in range(2,euler_phi (49))] 
for item in L: 
if item[1]!=8: 
pretty_print (html (r"$%s*{%s }\ equiv.%s\not\equiv. 
8$"%(x,item[Q],itemL1]))) 
else: 
pretty_print (html (r"$%s*{%s }\ equiv %s$u-. 
hooray! "%(x,item[0],item[1]))) 
break 


Looks like it’s 336. So we write x = 3¢ for some as yet unknown i, and get 
3° = 3°° (mod 49), 
which gives us 
6i = 36 (mod $(49) = 42) 


which itself reduces to 
i = 6 (mod 7). 


So i =  6,13,20,27,34,41 all work, which means that x = 3’ = 
43, 10, 16, 6, 39, 33 all should work. 


[mod(d,49)*6 for d in [43,10,16,6,39,33]] 


10.5.2 Discrete logarithms 
Similarly, we can try to solve logarithmic examples like 
5” = 17 (mod 19). 


Indeed, solving this problem is an example of what is called a discrete loga- 
rithm problem. Such problems are apparently very, very hard to solve quickly, 
but (!) no one has ever actually proved this. 


Example 10.5.4 Let’s solve 5° = 17 (mod 19). As we noted in Exam- 
ple 10.5.2, a primitive root modulo 19 is 2, and we can check that 5 = 
216 (mod 19) and 17 = 2!° (mod 19). Then, replacing these, we see that 


gia = 91° (mod 19) 


yields 
16x = 10 (mod 18). 
Since each of the numbers in this latter congruence is even, we can reduce this 
to 8a = 5 (mod 9), which further reduces to the easy-to-solve —x = 5 (mod 9). 
Taking « = —5 = 4, and keeping in mind the original modulus of 18, that 
suggests that we could let x = 4,13 in solving the original congruence. And 
indeed 
5¢ = 5'5 = 17 (mod 19) 
as desired: 
mod(5,19)*13, mod(5,19)%*4 


(17, 17) 
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Sage note 10.5.5 Reminder on equality. To check whether two things 
are equal, remember that you can just use == with the two expressions and see 
if you get True or False. 


Example 10.5.6 Let’s try to solve 16” = 13 (mod 19). 

Again, 2 is a primitive root of 19, and obviously 16 = 2+. It might look 
harder to represent 13; of course we could do it with the computer, but note 
that 13 + 19 = 32 = 2°. Sometimes we really can do them by hand! 

Thus our congruence becomes 


oe = 9" (mod 19) 


which yields 
4x = 5 (mod 18). 


However, since gcd(4, 18) = 2 { 5, by Proposition 5.1.1 this latter congruence 
has no solutions, so neither does the original congruence. (It turns out that 
16 has only order 9 as an element of Uj 9, and evidently 13 is not one of the 
elements in the subgroup generated by 16.) 


10.6 Exercises 


= 


Find primitive roots of 18, 23, and 27 (one for each modulus) using 
Lemma 10.2.3 to test various numbers. 


If a is a primitive root of n, prove that a! is also a primitive root of n. 
Show that there is no primitive root for n = 8. 
Show that there is no primitive root for n = 12. 


es ee 


Find two primitive roots of 81 using the Euler ¢ criterion Lemma 10.2.3 

(that is, by hand). 

6. Prove Lemma 10.3.4. Suppose p is prime and the order of a modulo p is 
d. Prove that if b and d are coprime, then a? also has order d modulo 
p. Hint: actually write down the powers of a?, and figure out which ones 
could actually be 1. Lagrange’s (group) Theorem 8.3.12 could also be 
useful. 

7. Prove Lemma 10.3.5. Suppose p is prime and d divides p—1 (and hence is 

a possible order of an element of U,). Prove that at most ¢(d) incongruent 

integers modulo p have order d modulo p. Hint: If there aren’t zero such 

integers, then there is at least one, which solves x? = 1 (mod p); now use 

the ideas in the proofs in Section 10.2 and Section 10.3. 


8. Find the orders of all elements of Uj3, including of course the primitive 
roots, if they exist. Then verify Claim 10.4.4 for p = 13. 

9. Challenge: Assuming p is prime, and without using Claim 10.4.4, prove 
that there are exactly ¢(p — 1) primitive roots of p if there is at least one. 


10. Finish the proof of Fact 10.3.6 for the case of composite n. 


11. Challenge: Assume that a is an odd primitive root modulo p*, where p 
is an odd prime (that is, both a and p are odd). Prove that a is also a 
primitive root modulo 2p°. 


12. Solve x® = 4 (mod 29). 
13. Solve x+ = 4 (mod 99) by writing this as the combination of two congru- 


ences which can be solved with primitive roots, and then using Subsec- 
tion 5.4.1 to put them back together. 
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14. Prove this crucial key to solving congruences by looking at the exponents 
in Section 10.5: If « = y (mod ¢(n)) and gcd(a,n) = 1, show that a” = a’ 
(mod n). Hint: Theorem 9.2.5. 


Exercise Group. Find all solutions to the following. Making a little table 
of powers of a primitive root modulo 23 first would be a good idea. 


15. x? =2 (mod 23) 16. 3° = 2 (mod 23) 
17. «* =2 (mod 23) 18. 13° =5 (mod 23) 
19. 32° = 1 (mod 23) 20. 3214 = 2 (mod 23) 
21. For which positive integers a is the congruence az* = 2 (mod 13) solvable? 


22. Conjecture what the product of all primitive roots modulo p (for a prime 
p > 3) is, modulo p. Prove it! (Hint: one of the results in Subsection 10.3.2 
and thinking in terms of the computational exercises might help.) 


10.7 All the Primitive Roots 


There is more to the primitive root story, but we won’t cover the rest in detail. 
The complete story of which n have groups of units U,, that are cyclic is given by 
Sage. Recall from Sage note 5.3.8 that the question mark gives us information. 


a=pari(5) 
a.znprimroot? 


Signature: a.znprimroot() 
Docstring: 
Returns a primitive root (generator) of 
(\mathbb{Z}/n\mathbb{Z}) *x, 
whenever this Latter group is cyclic (n = 4 or n = 2p*k 
or n = p*k, 
where p is an odd prime and k >= @). If the group is not 
cyclic, 
the result is undefined. If n is a prime power, then the 
smallest 
positive primitive root is returned. This may not be 
true for n = 
2p*k, p odd. 


Note that this function requires factoring p-1 for p as 
above, in 

order to determine the exact order of elements in 

(\mathbb{Z}/n\mathbb{Z})**x: this is likely to be costly 


if p is 
large. 
Init docstring: 
File: /tmp/... 
Type: method 


Notice that we already showed that bigger powers of two do not have prim- 
itive roots, so we have seen parts of both what does and what doesn’t have a 
primitive root. 

To make this result somewhat more plausible, the following cell demon- 
strates Exercise 10.6.11 — that an odd primitive root for a prime power is also 
a primitive root for twice that modulus. 


CHAPTER 10. PRIMITIVE ROOTS 153 


@interact 
def _(n=(7%2, prime_range(100)+[Li*2 for i in 
prime_range(3,25)]+Li*3 for i in prime_range(3,12)])): 
a=mod(primitive_root(n),n) 
if mod(a,2)==0: 
for i in range(1,euler_phi(n)): 
if gcd(i, euler_phi (n))==1: 
a=a‘*i 
if mod(a,2)==1: 
break 
pretty_print(html("$%s$_is a .primitive_root tof $%s$,_ 
hence_has_order_$%s$"%(a,n,euler_phi(n)))) 
pretty_print (html (r"The_ order _of_$%s$_in. 
$\mathbb{Z}_{%s}$_is also. 
$%s$"%(a,2kn,mod(a,2*n).multiplicative_order()))) 
pretty_print (html ("Compare._the_powers:")) 
print(La*i for i in range(1, euler_phi(n)+1)1]) 
a = mod(a,2xn) 
print(La*i for i in range(1,euler_phi (2*n)+1)]) 


This is also consistent with what we already know, since 6(2p°) = ¢(p*). Do 
the patterns in the interact help you think how you might solve the exercise? 

Finally, to really stretch yourself, how do you think you would get from a 
primitive root modulo p to one modulo p°*? How would you show that other 
numbers do not have one? 


Summary: Primitive Roots 


This chapter uses groups to uncover one of the most profound insights of 
Figure 10.0.1. 


1. We begin by defining primitive roots in Definition 10.1.1, and immedi- 
ately recharacterizing in terms of group theory in Proposition 10.1.4. 


2. A simpler way to test for whether a number is a primitive root is 
Lemma 10.2.3. 


3. In the next section we see some examples of numbers which do not have 
primitive roots. More importantly, we tackle the key Lemmas 10.3.4 and 
10.3.5 to understand the group U, for p prime. An example of something 
we win from this is Fact 10.3.6. 


4. Then we prove the famous result that Primitive Roots Exist for Primes. 


5. We conclude the chapter by using primitive roots to help solve interest- 
ing congruences, like higher degree polynomials in Subsection 10.5.1 and 
discrete analogues of the logarithm in Subsection 10.5.2. 


There is the usual variety of Exercises, and a short appendix about All the 
Primitive Roots. 
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Chapter 11 


An Introduction to Cryptog- 
raphy 


We are now ready for some applications. This chapter introduces cryptogra- 
phy, as well as the prototype for a cool mathematical encryption system and 
other similar topics. In Chapter 12, we will also discuss practical issues in im- 
plementing these — namely, finding huge primes and factoring huge composite 
numbers. 

By ‘huge’ I mean something substantially bigger than the output of the 
following commands. 


print (next_prime (randrange(2*100))) 
print (next_prime(randrange (2*20Q) )) 


82823055428384472362413881743 
760484670368065451826384290635929664594981544625732757532239 
Those are peanuts by today’s standards. But with the tools we’ve developed 
up to this point, we are ready for them. 


11.1 What is Cryptography? 


Cryptography is not just the science of making (and breaking) codes, as a 
dictionary might have it. It is the mathematical analysis of the tools of secrecy, 
from both the perspective of someone keeping a secret and that of the person 
trying to figure it out. Sometimes it is also called cryptology, while sometimes 
that term is reserved for a wider meaning. 

There are two kinds of codes. 


e There are codes which disguise information and are intended to remain 
secret! (Especially for those needing private communication.) 


e There are codes encapsulating information in a convenient format, not 
needing secrecy. (Especially to allow for error checking.) 


Mathematicians use the word code to indicate information is being stored, 
reserving the term cipher to talk about a way to protect that information. So, 
what we do when learning about this is some of each, though mostly about 
ciphers. 
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11.1.1 Encoding and decoding 


There are many ways to encode a message. The easiest one for us (though 
not used in practice in exactly this way) will be to simply represent each letter 
of the English alphabet by an integer from 1 to 26. It is also easy to represent 
both upper- and lowercase letters from 1 to 52. 

We’ll use the following embedded cell to turn messages into numbers and 
vice versa. You encode a plaintext message (no spaces, in quotes, for our 
examples) and decode a positive integer. 


def encode(s): # Input must be in quotes! 
s = str(s).upper() 
return sum((ord(s[i])-64)*26’%i for i in range(len(s))) 


def decode(n): 
n = Integer (n) 
list = [] 
while n != Q: 
if n%26==0: 
List. append (chr (64+26) ) 
n -= 1 
else: 
List. append(chr (n%26+64) ) 
n //=26 
return ''.join(list) 


Sage note 11.1.1 Definitions. This cell should not have any output. The 
code def followed by a function name and input variable name (and colon) just 
tells Sage to define a new (computer, not necessarily mathematical) function. 
Then the commands after the first line of each definition say what to do, includ- 
ing what to send back to the user, the return statement. As long as nothing 
goes wrong, no output is required — you told Sage to do something, and it did 
it. 

This is a very handy way to make new mathematical functions too. Even 
something as basic as def f(x): return x*2 could be useful, though in this 
simple case Sage gives you many more tools if you use the syntax f(x) = 
x*2 instead. Try to watch the Sage code throughout, especially in the final 
few chapters like Section 23.3, for usage of the def statement to make new 
functions. 

Let’s try to encode the letter “q”. 


encode('q') 


17 


Sage note 11.1.2 Always evaluate your definitions. If the previous cell 
doesn’t work, then you may need to evaluate the first one in this section again. 
If anything in this chapter ever gives a NameError about a global name encode, 
you probably need to reevaluate some previous cell. Most likely, the one with 
def encode! 

The process of decoding (or to decode) is similar. 


decode (17) 


0 
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This should be straightforward. Too straightforward, perhaps. What are 
some issues here? 


e First, notice that I didn’t bother separating lower and uppercase letters. 


e Also, no matter how complicated you get, with just a one-to-one corre- 
spondence, there are only a few possibilities for each letter. So if you 
know the human language in question, you can just start guessing which 
encrypted number stands for its most common letter. 


e Can you think of other drawbacks? (See Exercise 11.8.14.) 


That means that, in practice, we need to do a few other things. One thing 
that is commonly done is to make longer blocks of letters, and then turn those 
into numbers. After all, presumably there are a lot more three-letter (or longer) 
possible blocks of letters in English than would make it too easy to decrypt 
them. (Can you think of exceptions, though?) 

For pairs, we will represent the first letter as a number from 1 to 26, and the 
second letter as 26 times the letter number (think of it as base 26). Remember 
that A=1, B=2, etc. 

Now compare the following two encodings of “The best day of the year” 
and see which one might be easier to decipher. 


Lencode(letter) for letter in 'Thebestdayofthisyear '] 


print (encode('cb')) 
print (decode (3+26%2) ) 


55 
CB 


Lencode(pair) for pair in 
cdi". Velo" Vest, Mech! Va. Mor! Melo? Vaiss we? Vere Pay) 


[228, 57, 499, 124, 651, 171, 228, 503, 155, 469] 


Whereas there are many 5s in the first encoding, which you could guess were 
Es, the second one has only one repeat (though knowing English, one might 
guess it was ‘Th’). For this reason, it’s important to point out we haven’t made 
anything secret yet, we’ve just encoded. 

With three letter blocks, there are then already 26° = 17576 possibilities. 


print (encode('zab')) 
print (decode (26+1*26+2*26%2) ) 


1404 
ZAB 


One could use this to encode the phrase INT HEB EGI NNI GWA STH 
EWO RDx. In this case, we use an extra X to fill out the space from a famous 
quote; much more sophisticated filler can be used in real cryptography. 

To be fair, when filler of this type is used, it would more often be used in 
the middle to confuse things. In addition, one might recombine the message 
in various ways. We will, however, usually keep our whole message together as 
one item, since we want to understand the mathematical aspects most, rather 
than real cryptography. 
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11.2 Encryption 


We will spend most of our time talking about enciphering, or encrypting, 
messages. Such encryption is the difficult part, after all, the details of which 
we want to keep secret. 

What is cool about modern ciphers is that we actually expect that any eaves- 
dropper will know how we do the encryption; they just don’t know the key, 
which is the specific numbers we use to perform our mathematical encryption. 

Reversing this process (hopefully only done by the person you want to re- 
ceive your message!) is called decryption. Sometimes you need a different set 
of numbers to decrypt, in which case we distinguish between the encryption 
key and the decryption key. 


Sage note 11.2.1 Reminder to evaluate definitions. Don’t forget to 
evaluate the first cell of commands so we can use words as messages instead of 
just numbers. 


def encode(s): # Input must be in quotes! 
s = str(s).upper () 
return sum((ord(s[i])-64)*26*i for i in range(len(s))) 


def decode(n): 
n = Integer (n) 
list = [] 
while n != @Q: 
if n%26==0: 
List. append (chr (64+26) ) 
n -= 1 
else: 
List. append(chr (n%26+64) ) 
n //=26 
return ''.join(list) 


11.2.1 Simple ciphers 


In the past, one would usually assume that both the sender and the receiver 
keep their keys secret (seems reasonable!), which is called symmetric key 
cryptography. The symmetry is that both people need to keep it secret. One 
early example of this supposedly goes back to C. Julius Caesar. To encrypt 
a message, first convert it to numbers, and then add three to each number 
(‘wrapping around’ as in modular arithmetic if needed), and convert back to 
letters. 


message='MathIsCool' 
secret=[Lencode(letter) for letter in message] 
secret 


[13, 1, 20, 8, 9, 19, 3, 15, 15, 12] 


It’s pretty clear that 1=A here, for instance. Now let’s add three to each. 
The second letter should get to 4=D, for instance. 


code=[(x+3)%26 for x in secret] 
print (code) 
print(''.join([decode(m) for m in code])) 
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[16, 4, 23, 11, 12, 22, 6, 18, 18, 15] 
PDWKLVFRRO 


What did I do here? Again, this is just modular arithmetic, modulo 26, so 
I added 3 mod (26). 


11.2.2 Decryption and inverses 


How will I decrypt it, if I get this mysterious message? Here is the main point 
about mathematical ciphers; they need to come from operations that have 
inverses! So in number theoretic ciphers, they’ll need to come from (somehow) 
invertible operations. 

In this case, the operation is modular addition, which certainly has inverses. 
If your encoded numerical message is x, your key is a, and you are working 
modulo (n), then your encrypted message m is 


m=2+amod(n) 


To get x back, you just use the additive inverse to a modulo n, which is —a. 
Since —3 is the inverse of 3, this one is easy to decipher. 


'' join(Ldecode((x-3)%26) for x in code]) 


"MATHISCOOL '! 


We could list the key here as a pair (a,n), with a = 3 and n = 26. 

As noted above, one can do something similar with bigger numbers, in 
blocks of two. In the next Sage cell, the code requires a message with an even 
number of letters; can you make it more flexible? 


message='Mathiscool' 

secret=[encode(message[2*i:2*i+2]) for i in 
range(len(message)/2) ] 

secret 


[39, 228, 503, 393, 327] 


11.2.3 Getting more sophisticated 


Let’s do something a little more interesting to encrypt our ‘secret’ about how 
cool math is. What else has inverses? 
Well, sometimes multiplication mod (n) does! We could make a cipher that 
gets m by performing 
m = ax +b (mod n). 


Here, let’s choose a = 5 and b = 18. Since we have blocks of two letters each, 
the encoding mechanism could yield numbers up to 26? + 26, so we’ll encrypt 
using n = 709, the next prime after that. 


n = next_prime (26%2+26) 

code=[(5*x+18)%n for x in secret] 

print (code) 

print(''.join([decode(m) for m in code])) 


[213, 481, 502, 629, 299] 
EHMRHSEXMK 


Now the key is listed as a triple, (a, b,n) = (5, 18,709). How do we invert 
this? 
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To get from ax + b back to x, ordinarily we would subtract b and then 
divide by a. Now we are working over Z,,, so is that possible? We’ll need our 
first ‘extra’ condition. 


Fact 11.2.2 To make modular encryption by a linear function workable, we 
need gcd(a,n) = 1. In that case there is a number a’ such that 


a(a’) =1 (mod n), 
so we can decode via 


mt—+a'(m—b) =x (mod n). 
To decode this particular example, then, we need to first subtract 18, then 
multiply by an inverse to 5 (mod 709) (which turns out to be 142): 


'' join(Ldecode(142*(x-18)%709) for x in code]) 


"MATHISCOOL ' 


You should get ‘MathIsCool’ or whatever message you originally used. For 
convenience, you can use the cell below to do this in just one step, picking your 
own a and b along with your own (even length) message. 


message='hiphiphooray ' 

a= 5 

b = 18 

secret=[encode(message[2*i:2*i+2]) for i in 
range(len(message)/2) ] 

n = 709 

ainv = mod(a,n)*-1 

code=[(a*xtb)%n for x in secret] 

print(''.join([decode(m) for m in code])) 

print(''.join(L[decode(ainv*(x-b)%n) for x in code])) 


EUSQHDDYLOSU 
HIPHIPHOORAY 


The proof of the pudding is in the eating. There’s no way I get the original 
message back unless this works! Can you modify the Sage cell above to break 
your message into groups of three letters instead? 


11.2.4 Linear algebra and encryption 


There is another way of using blocks of size two or more, which we won’t pursue 
very far, but which is a big piece of how standard encryption works (see here? 
and here”). Let’s look at our message again. 


message='Mathiscool' 
secret=[Lencode(letter) for letter in message] 
secret 


[13, 1, 20, 8, 9, 19, 3, 15, 15, 12] 

Now, in blocks of two, I will change my numbers by turning the first one 
into the sum of the numbers modulo 26 and leaving the second one alone. So 
for the second block (20,8), I will change that block to (28,8), which modulo 
26 becomes (2, 8). 


len.wikipedia. org/wiki/Data_Encryption_Standard 
2en.wikipedia. org/wiki/Advanced_Encryption_Standard 
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[(secretLil]l+secretli+1])%26 if i%2==0 else secret[i] for i 
in range(len(secret))] 


[14, 1, 2, 8, 2, 19, 18, 15, 1, 12] 


This turns out to be the same thing as multiplying the corresponding list 
of vectors of length two by a matrix! 


1 1 
(0) 
To invert this cipher, we would need an inverse to this matrix modulo 26. 
(People don’t do something quite so naive, as there aren’t too many inverses 
modulo 26, but for our purposes this suffices.) 
In any case, this is another connection to the rest of mathematics! And 


it is a huge reason why linear algebra over finite algebraic structures is very 
important in security. 


11.2.5 Asymmetric key cryptography 


Finally there is another type of encryption, which is rather different. There 
exists the possibility that everybody knows the key to encrypt, while only the 
legitimate person knows how to decrypt. This is called asymmetric key 
cryptography. 

This idea may seem odd. But in practice today, people really do just post 
their encryption keys on the Internet! In the live book, this links? the public 
key of a fairly well-known open-source software advocate, for example. 

In theory, anyone who wants to send Person XYZ a secure message could 
use this key, but only Person XYZ can decrypt it — convenient! Such an 
implementation of an asymmetric system is called public-key cryptography, 
although of course it’s only the encryption key that is actually public. 

In this chapter, we will see examples of both symmetric and asymmetric 
systems, but the main point is to lead up to the mathematics of basic public 
key systems. 


11.3 A Modular Exponentiation Cipher 


To prepare for discussion of a famous public-key system, we will first discuss a 
(symmetric) system that leads to it. This system needs yet another invertible 
number theory procedure, one that we have used enough to be quite comfort- 
able with. 

That procedure is modular exponentiation as cipher. Recall that we have 
methods to solve modular exponential congruences (such as using primitive 
roots). That gives us tools sufficient to implement these subtle techniques. 


Sage note 11.3.1 Another reminder to evaluate definitions. Don’t 
forget to evaluate the commands below so we can use words as messages instead 
of just numbers. 


def encode(s): # Input must be in quotes! 
s = str(s).upper() 
return sum((ord(s[i])-64)*26*i for i in range(len(s))) 


def decode(n): 


3www.catb.org/esr/gpg-public-key.asc 
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n = Integer (n) 
list = [] 
while n != @Q: 
if n%26==0: 
List. append (chr (64+26) ) 
n -= 1 
else: 
List. append(chr (n%26+64) ) 
n //=26 
return ''.join(list) 


11.3.1 The Diffie-Hellman method 


In the cell below, we will pick a few numbers relevant to this method. To use 
it, we will need a prime number p, and some legitimate exponent e that won’t 
mess things up too badly. (Also, suppose our secret is still that math is cool.) 

What do I mean by ‘won’t mess things up too badly?’ Recall from Subsec- 
tion 10.5.1 that when we solved 


x = 5 (mod 17) as 3% = 3° (mod 17) 
we ended up in the world of 4(17) = 16 and solved 
3 = 5 (mod 16). 


This required a solution i to exist, which wouldn’t happen for all possible 
choices of numbers in a congruence! 

In order to keep using these ideas easily, we will pick an exponent coprime 
to $(p). 

Now, here is the algorithm (see also Algorithm 11.3.3). I just take my 
message (as a number) and raise it to the e power modulo p. It’s as simple as 
that! 

In the cell below, we pick a convenient e and p. 


p=29 # a prime number 

e=9 # a number coprime to euler_phi(p)=p-1=28 
message='MathIsCool' 

secret=[Lencode(letter) for letter in message] 
code=[mod(x,p)*e for x in secret] 

print (code) 

print(''.join(L[decode(m) for m in code])) 


[5, 1, 23, 15, 6, 11, 21, 26, 26, 12] 
EAWOFKUZZL 
Here I picked p = 29 since it’s close to 26, and more or less arbitrarily 
picked an exponent e = 9 (though it does have to be coprime to 28 = ¢(29)). 
Note the steps. I first had to encode “MathIsCool” to numbers. Then I 
exponentiated each number in the coded version, modulo 29. To be precise, I 
sent each number 
at++ a? (mod 29). 
Remark 11.3.2 Notice that decoding the secret message code is not so useful 
anymore! (What would we do with the number 28 as an output, for instance?) 
So we usually just stick with the numbers. 
Leaving aside for the moment that the letter A will now have the unfor- 
tunate property that it always stays 1, and hence basically unencrypted (this 
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is because we are doing a toy example), how on earth would we ever decrypt 
this? Do we have a way to invert 


a? (mod 29) 


in any way? 
Naturally, we do! We will use exponentiation again to do so. We just need 
something that solves 
(a°)! =a (mod 29), 


or more concisely 
a°J = a} (mod 29). 


(We can think of f as a power that inverts the original power 9.). 
From our discussion in Section 10.5, solving this congruence is tantamount 
to solving 
9f = 1 (mod 28) 


and we know we can find this. In the cell below, we do it computationally, but 
you could do this one ‘by hand’. 


f=mod(e,p-1)*-1 # the multiplicative inverse mod p-1 (!) to 
our encryption power 

print (f) 

print(''.join([decode(x*f) for x in code])) 


25 
MATHISCOOL 


This method of encryption is known as the Diffie-Hellman method (named 
after its originators, who proposed it in the mid-70’s); see Historical remark 11.4.1 
and Historical remark 11.3.5. 


11.3.2 A bigger example 


Now we will do a more real example of this. Notice how important it was that 
we chose an initial exponent e that was coprime to ¢(p) = p—1. 


message='heymathiscooleverybody ' 
secret=encode (message) 
secret 


13044594485924740120065295822374 


For convenience, I’ll just take the next prime bigger than my message. 


p=next_prime(secret) 
print (p) 
print (factor(p-1)) 


13044594485924740120065295822453 
2°2 * 3%2 * 11 * 17 * 8273 * 234219716629408326624607 


Next, I pick an exponent. Not every exponent will work! Beforehand I 
factored p — 1 so I could find something coprime to it. 


e=10103 # a number coprime to p-1 
code=mod(secret ,p)%e 
code 


9687827625907130820812107474110 
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The encrypted message is now just one number. Now we need the decryp- 
tion key. Luckily, that’s just as easy as taking an inverse modulo p — 1: 


f=mod(e,p-1)%*-1 
print (f) 
print(''.join(decode(code*’f))) 


5098792796685815968933767514883 
HEYMATHISCOOLEVERYBODY 


Here is one more extended Sage example; why not try your own message? 
Here, the interesting point is that I allow Sage to pick a prime for me using 
next_prime(). (If it fails, try changing e to something coprime to p— 1.) 


message='mathisreallycoolanditshouldntbeasecret ' 

secret=encode (message) 

p=next_prime((secret) *5) 

e=677 # hopefully coprime to p-1 

code=mod(secret ,p)%e 

f=mod(e,p-1)%*-1 

pretty_print (html ("My encoded message _is_$%s$"%secret) ) 

pretty_print (html ("A_big_ prime bigger. than_that is _$%s$"%p) ) 

pretty_print (html ("And _I_chose_exponent_$%s$"%e) ) 

pretty_print (html ("The encrypted_message is. $%s$"%code) ) 

pretty_print (html ("The inverse_of $%s$_is_$%s$"%(e,f))) 

pretty_print (html ("And.the_ decrypted _message.turns_out ito. 
be:")) 

print(''.join(decode(code*’f))) 


11.3.3 Recap 


Here is the formal explanation of our first awesome encryption scheme. 


Algorithm 11.3.3 Diffie-Hellman Encryption. To encrypt using this 
method, do the following. 


¢ Turn your message into a number x. 
e Pick a prime p (presumably greater than x). 
e Pick an exponent e such that gcd(e,p—1)=1. 


e Encrypt to a secret message by taking 


m= x° (mod p). 


Here are the steps for decryption. 
e Find an inverse modulo p—1 to e, and call it f. 
e Decrypt (if you want) by taking 
mf = x (mod p) 
e Celebrate in your opponent’s destruction. 
Proof. Why does this work? First, note that our condition on f is equivalent 


to 
ef =1 (mod p- 1). 
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Then we can simply compute that 


mf = («°)f =a°f = 2! = (mod p) 


which verifies that we get the original message back. a 


Feel free to use the following Sage cells to see what happens with your own 
short messages. 


@interact 
def _(message='mathiscool',e=677): 
secret=encode (message) 
p=next_prime(100*(secret)) 
mie (ela .()—1)) FS We 
pretty_print (html ("Looks _like_$%s$_isn 't_coprime_to. 
the prime! _Try another _one."%e)) 
else: 
code=mod(secret ,p)*e 
try: 
f=mod(e,p-1)%*-1 
except: 
pretty_print (html ("Looks _like_$%s$_is not. 
coprime_to_the_ prime _wechose, $%s$"%(e,p))) 
pretty_print (html ("My encoded.message_is. 
$%s$"%secret)) 
pretty_print(html("A_big prime_bigger._than_that is. 
$%S$"%p) ) 
pretty_print (html ("And I_chose_exponent_$%s$"%e) ) 
pretty_print (html ("The encrypted_message is. 
$%s$"%code) ) 
pretty_print(html("The_inverse_of_ $%s$ is. 
$%S$"%(e,f))) 
pretty_print (html ("And_the_decrypted.message,turns,, 
out_to_be:")) 
print(''.join(decode(code*’f))) 


Or you can choose a prime on your own. 


@interact 
def _(message='hi',p=991,e=677): 
secret=encode (message) 
if is_prime(p) and gcd(p,e)==1 and p>secret: 

e=677 # hopefully coprime to p-1 

code=mod(secret ,p)%e 

try: 
f=mod(e,p-1)%*-1 

except: 
pretty_print (html ("Looks_like_$%s$_is not. 

coprime_to._the.prime_we,chose, $%s$"%(e,p))) 

pretty_print (html ("My encoded._message_is, 
$%s$"%secret)) 

pretty_print(html("A_big prime_bigger_than_that isu 
$%S$"%p)) 

pretty_print (html ("And _I_chose_exponent_$%s$"%e) ) 

pretty_print (html ("The encrypted.message is. 
$%S$"%code) ) 

pretty_print(html("The_inverse_of_ $%s$ is. 
$%s$"%(e,f))) 

pretty_print (html ("And_the_decrypted.message,turns,, 
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out_to_be:")) 
print(''.join(decode(code’f))) 
elif not is_prime(p): 
pretty_print (html ("Pick_a_prime_$p$!")) 
elif p <= secret: 
pretty_print (html ("Make_sure_your._prime iis bigger. 
than _your._secret ,_$%s$"%secret) ) 
else: 
pretty_print (html ("Make_sure_that_$gcd(p,e)=1$!")) 


Sage note 11.3.4 Compute what you need. Remember, you can always 
compute anything you need. For instance, if you for some reason didn’t pick a 
big enough prime, you can use the following command to find one. 


next_prime (11058) 


11059 


Historical remark 11.3.5 Diffie and Hellman. In 2015, Whitfield Diffie 
and Martin Hellman won the Turing Award for their contribution, the highest 
award in computer science. 


11.3.4 A brief warning 


Remember, the key that makes it all work (thanks to Fermat’s Little Theorem/ 
Euler’s Theorem) is that exponents of congruences mod n live in the world of 
congruences mod ¢(n), as long as they are numbers coprime to ¢(n). That’s 
why gcd(e,p — 1) = 1 is important. 

Here’s an example of how not choosing your exponent wisely can go wrong. 


message='hi' # needs to be in quotes 
secret=encode (message) 

p=991 # needs to be bigger than secret 
e=2 # NOT coprime to p-1 
code=mod(secret ,p)%e 

code 


95 


Sage note 11.3.6 Change values right in the code. Some Sage cells 
have little text boxes or sliders for interacting. But you can use any of them 
to change the values we are playing with; try changing the variable message in 
the preceding cell to encode your own secret. 

Assuming you followed along, so far, so good; it got encrypted. But what 
happens when we try to decrypt? 


f=mod(e,p-1)%*-1 
message ,secret ,code,decode(code“*f) # prints all the steps 


Traceback (most recent call last): 


ZeroDivisionError: inverse of Mod(2, 990) does not exist 


You should have gotten an error (in fact, a ZeroDivisionError, which 
should sound relevant). It turns out not even to be possible to go back- 
wards. Be warned that you must know the mathematics to use cryptography 
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wisely. 


11.4 An Interesting Application: Key Exchange 


There is a quite useful application of Diffie-Hellman called key exchange. In 
fact, this is the original application they had in mind. 


Historical remark 11.4.1 Diffie-Hellman controversy. There is a little 
controversy over exactly whom to credit for originating the concept of public- 
key cryptography. Researchers at the British intelligence unit GCHQ published 
a number of internal papers on methods similar to those in this chapter, and 
Ralph Merkle previously published a paper introducing the notion. However, 
the specific mathematics are due to Diffie and Hellman, who were the first to 
publish in a public venue, so it seems reasonable to keep the traditional name. 


11.4.1 Diffie-Hellman Key Exchange 


Here is the basic concept of key exchange. Two people trying to pass informa- 
tion (universally called Alice and Bob) want to decide on a secret key for using 
some encryption routine. Since all we really care about are the numbers, once 
we’ve encoded, we should just assume the key is a number. 

Unfortunately, Alice and Bob know that someone may be listening in on 
their decision. Instead of trying to send a secret key only one of them has 
chosen, they try to create a secret key together using (essentially) public means. 
Here’s how it works. 


Algorithm 11.4.2 Diffie-Hellman key exchange. Here are the steps. 


e First, Alice and Bob jointly pick a big prime p and a base for exponenti- 
ation g, presumably with 1 <g <p. This doesn’t need to be secret. 


e Now, they each secretly choose an exponent; maybe Alice chooses m and 
Bob chooses n. 


e The key step: Each of them exponentiates g to their secret power, modulo 
p. 


e Then they pass off these numbers to each other, and once again expo- 
nentiate the other person’s number to their own secret power, modulo 


p. 


The resulting numbers are the same and give the secret key. 
Proof. The two numbers are (g™)” = g™” and (g")™ = g”™, which are the 
same, and certainly are so modulo p. a 


Example 11.4.3 Alice and Bob pick p = 991 and g = 55, and then (separately) 
pick m = 130 and n = 123. Then they compute the powers g™ and g” modulo 
D. 


p=991 

g=mod(55,p) 

m=130 

n=123 

Alice_does=g*m 

Bob_does=g*n 
print("Alice_does", Alice_does) 
print ("Bob.does", Bob_does) 
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Alice does 722 
Bob does 114 


Alice and Bob have different numbers now, but after doing their powers 
after the exchange, the numbers should be the same. 


Bob_does*m,ALice_does“n 


(877, 877) 


Note the code takes one power to the m and the other power to the n. 


Thus, now they have a secret key (g™” = g”™) they can easily compute but 
which a spy in the middle cannot. Feel free to try this with your own numbers 
you pick! 


@interact 
def _(p=(991, prime_range (1000) ) ,g=55,m=130,n=123): 
g=mod(g,p) 
pretty_print (html ("If you. jointly picked _$p=%s$_and base. 
$g=%S$"%(p,8))) 
pretty_print(html("Then_separately picked secret powers, 
$m=%s$_and_$n=%s$"%(m,n))) 
pretty_print(html(r"Your_publicly._traded_info would be. 
$%s*{%s}\ equiv %s$_ andi $%s*{%s }\ equiva 
2s$"%(g,m,g*m,g,n,g*n))) 
pretty_print (html(r"But_the_secret_joint key would be. 
$%s*{%s\cdoti%s }\equivi%s$"%(g,m,n,g*(mxkn)))) 


This number g”” can now be used in some symmetric encryption system 


as a key for both Alice and Bob. 


11.4.2 In the Middle 


Having a key that isn’t directly communicated should help protect from any 
potential Eve who might be listening in. (That’s Eve for eavesdropping, believe 
it or not — also a universal person in these stories.) That is good news. 

On the down side, if Eve is not only listening, but actually has access to 
Alice and Bob’s transmissions and can change them, she can still cause trouble. 
Eve can in this situation add her own exponent, @, to the game, so that she 
pretends to have secret key g’ with Alice and secret key g™ with Bob. Both 
of their keys’ security is now compromised. 

Such a situation is historically known as a “Man in the Middle” attack. 
There is no obvious way to stop such an attack with this algorithm, if Eve has 
that much power. (See Exercise 11.8.5.) 


11.5 RSA Public Key 


Sage note 11.5.1 We keep reminding you. Remember, this cell contains 
the command used to make numbers from letters (and vice versa), so always 
evaluate the cell before doing any en/decoding. 


def encode(s): # Input must be in quotes! 
s = str(s).upper() 
return sum((ord(s[i])-64)*26*i for i in range(len(s))) 


def decode(n): 
n = Integer(n) 
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list = [] 
while n != @Q: 
if n%26==0: 
List. append (chr (64+26) ) 
n -= 1 
else: 
List. append(chr (n%26+64) ) 
n //=26 
return ''.join(list) 


In order to deal with some of the issues of symmetric systems, we will now 
introduce the most famous public-key system. Recall that this means we have 
an encryption key that is easy for anybody at all to use, but is very difficult to 
undo unless you know the secret. (Sometimes this is called a trapdoor system, 
because it’s easy to fall in but it’s hard to get back out unless you know where 
the secret passageway is!) 


Historical remark 11.5.2 Who is RSA? The formal name for the system in 
this section is “Rivest, Shamir, Adleman” or RSA, for Ron Rivest, Adi Shamir, 
and Leonard Adleman, who developed it in the late 1970s. The acronym con- 
tinues to be the name of the security company they cofounded*. Like the 
Diffie-Hellman protocol, the British intelligence unit GCHQ also developed it 
in earlier (then-classified) documents. 


11.5.1 The background 


The idea behind RSA is to make Diffie-Hellman, which relies only upon The- 
orem 7.5.3 and primes, into a system which involves Euler’s Theorem (9.2.5). 
We want to do so, but not so heavily as to make the computation too expen- 
sive. (With the advent of mobile devices, it turns out that this has once again 
become a big issue, so much so that even RSA or similar methods are be- 
ing replaced with more sophisticated ones involving curves like those coming 
from the Mordell equation (recall Section 15.3), known as elliptic curves. See 
[E.4.19] for an excellent full introduction to this at about the level of this text, 
which could help in answering Exercise 25.9.12; a more targeted approach is 
in [E.2.10, Chapter 18.6]. 

It turns out that the easiest way to keep computation easy while sticking 
with exponentiation is to choose as a modulus a large integer n with only two 
prime factors, instead of one large prime p as we did before. For instance: 


p=89 

q=97 

n=pxq 

print("Multiply.the primes _%s_andi%s_to_get_our_modulus, 
es"%(p,q,n)) 


Multiply the primes 89 and 97 to get our modulus 8633 


Exponents here live in the world of ¢(n). We can easily compute this using 
Fact 9.5.2 (so that 6(n) = (p — 1)(q—1)). So the computations are going to 
be easy for us, assuming we know p and q. 

But they will not be so easy to compute without that knowledge, for which 
we need to have the prime decomposition of n. In particular, for reasonably 


4www.rsa.com 
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large n, that means ¢(n) is essentially secret to anyone who isn’t tough enough 
to factor n. 
Remark 11.5.3 At least that’s what people currently believe; if it isn’t true, 
we are in deep trouble security-wise, as we will see later. 

As an example, in the early 1600s, Fermat believed 2°? + 1 was prime. It 
took until 1732 and the genius of Euler to factor 2°? + 1 as follows®, which 
shows the one hundred sixteenth prime is the smaller of two factors. 


2°32 hileaiac tor @2- 32calb) nei piplmes@inl6s) 


(4294967297, 641 * 6700417, 641) 


Hence n = 22? +1 wouldn’t have been a bad n to choose in the early 1700s, 
since it would take a lot of trial and error to get to the one hundred sixteenth 
prime! 


11.5.2 The practice of RSA 


That’s the preliminaries. From now on, we do exactly the same thing as before, 
choosing an e coprime to ¢(n), etc. This time, though, instead of keeping e 
secret, we let anybody know it (along with n, which we have to let people know 
anyway). 


Example 11.5.4 With the same primes, let’s choose e = 71, because that is 
coprime to (89 - 97) = $(89)¢(97) = 88 - 96 = 8448. 


p=89 

q=97 

n=px*q 

phi=euler_phi (n) 

e=71 

print ("Multiply ithe primes %s.and.%s_to_get_our modulus, 
%8"%(p,q,n)) 

print ("Are_e=%s_and_phi (%s)=%s_coprime?"%(e,n,phi)) 

print (gcd(e, phi) ==1) 


Multiply the primes 89 and 97 to get our modulus 8633 
Are e=71 and phi (8633) =8448 coprime? 
True 


We compute an inverse mod ¢(n) just as before, which will be (as before) 
our decryption key. Since we are able to compute ¢(n), it isn’t hard to get an 
inverse for e. If you only knew n, though, it would be very hard to do this 
(for reasonably large n); or at least, it is supposed to be hard to compute ¢(n) 
without factoring n, though it has yet to proven. 


f=mod(e,phi)*-1;f 


119 


Now, just like with Diffie-Hellman, I raise my message (number) to the 
power e to encrypt, and raise to the power f to decrypt an encrypted message. 
Here are all the steps together! 


5Weil points out in [E.5.8, II.[V] that Fermat had the tools to do this (see the discussion 
at the end of Subsection 7.5.2), but apparently just completely neglected to use them, so 
convinced was he of his correctness. 
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@interact 
def _(message='hi',p=89,q=97,e=71): 
secret=encode (message) 
n= p< 
phi = (p-1)*(q-1) 
if gcd(n,e)==1 and n>secret: 
code=mod(secret ,n)%e 
try: 
f=mod(e,phi)*-1 
pretty_print (html ("My encoded_message is, 
$%s$"%secret)) 
pretty_print(html(r"A big i product of iprimes,, 
bigger than that is. 
$pq=%s\cdot%s=%s$"%(p,q,n))) 
pretty_print (html (r" (which_means my secret. 
$\phi(n)=\phi(%s\cdot_%s)=(%s-1) (%s-1) $ is. 
$%s$)"%(p,q,p,q,phi))) 


pretty_print(html(r"The_encrypted.message_is,, 
$%s*{%s}\equiv%s$"%( secret ,e,code))) 
pretty_print (html ("The _inverse_of_$%s$ modulo. 
$%s$_is $%s$"%(e, phi, f))) 
pretty_print (html ("And_the_decrypted_message,, 
turns_out_to_be:")) 
print(''.join(decode(code*’f))) 
except: 
pretty_print (html (r"Looks like _$%s$_is_ not. 
coprime._to_$\phi (%s) =%s$"%(e,n,phi))) 
elif gcd(phi,e)!=1: 
pretty_print (html (r"Make_sure._that. 
$gcd(\phi(n) ,e)=1$!")) 
elif n <= secret: 
pretty_print (html ("My encoded_message_is,, 
$%s$"%secret)) 
pretty_print (html (r"Make_sure.that_$pq=%s\cdot. 
*4s=%s$ is bigger_than_your._secret"%(p,q,n))) 


pretty_print (html ("And _I_chose_exponent_$%s$"%e) ) 


11.5.3 Why RSA works 


Now we have an encryption method where anyone can encrypt. The modulus 
n (not written as pq) and e are both published, and anyone who wants to send 
a message of length n or less just exponentiates. You just have to be sure that 


o(n) and e are coprime for it to be defined properly. 


Algorithm 11.5.5 RSA encryption algorithm. In order to encrypt a 


message x via RSA with public key (n,e), you do 
x° (mod n). 
In order for the owner of the key to decrypt a message m, they do 
-1 


m® =m? (mod n) 


for any f solving ef =1 (mod ¢(n)). 
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Proof. Assume the original message was x and that this is coprime to n. Since 


ef =1 (mod ¢(n)) 


we have ef = kd(n) +1 for some integer k. Hence by Euler’s Theorem we have 
k 
(x®)! = af = ghb(m)t) — (x#%n) a! = 1*¢ =x (mod n). 


So it all works out, we recover the original message. 
Interestingly, because n = pq is a product of different primes, we don’t actually 
need the coprime hypothesis for the message, which is nice not to have to check. 
Suppose p | x but gced(q, x) = 1, for example. Then modulo p we have (x°)t =2 
because both are zero, while modulo q we do a bit more computation to see 
ko(p) 
x 


ee 


(a°)f = hott — ghd (p)o(at1 = (2%) 

By (essentially) the Fundamental Theorem of Arithmetic that suffices to show 

they are equivalent modulo n = pq as well. (If pq | 2, then « = 0 so things 

aren’t very interesting.) a 

And if someone nefarious were to try to decrypt this, they would need access 

to f somehow, or something equivalent to it mathematically. That would mean 
solving 


ef =1 (mod ¢(n)) 


for f without actually knowing what ¢(n) is! 
Naturally, that is pretty easy to compute in the cases above. But in real 
life? 


p=next_prime(randrange (2°50) ) 

q=next_prime(randrange (2°50) ) 

n=pxq # needs to be bigger than secret 

print ("The first part _of_my key ,.%s, isthe product of iumy, 
secret_primes"%n) 


The first part of my key, 387557680000801386581770958669, 
is the product of my secret primes 


The n in the cell above is the product of two primes — but would you like 
to try to compute ¢(n) by hand? Without knowing the actual primes, it could 
be very difficult to figure out ¢(n), which you probably need to get f. 

Realistic examples have much larger primes than this, say 100 digits. But 
let’s see what would happen next in a ‘real’ example. 


message='mathiscool' # needs to be in quotes 
secret=encode(message) # needs to be less than n 
print ("My message_is.%s.numerically"%secret) 


My message is 68408084029415 numerically 


Hopefully the randomness of the p and q I picked didn’t keep n from being 
greater than the numerical value of the message. 

Now we pick the other piece of our key, e. Believe it or not, it doesn’t really 
seem to matter (though no one has proved this) what e is. Documentation for 
a widely used RSA implementation® says this: 


-F4|-3: The public exponent to use, either 65537 or 3. The default 
is 65537. 


Swww.openssL.org/ 
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The documentation used to also recommend 17, which I figure is easier to 
use than 65537 but less obvious than 3. Let’s check that it’s coprime to the 
modulus of the key. 


phi=euler_phi(n) 

e=17 # needs to be coprime to phi 

print ("And _I.can_check _whether_e=17_is_coprime_to_phi (%s)"%n) 
print (gcd(phi ,e)==1) 


And I can check whether e=17 is coprime to 
phi (674932867331573648976699887017) 
True 


If you get False above (I did once in a while during testing), then just pick 
a different e. (Only evaluate the following cell if you have to!) 


e=65537 # needs to be coprime to phi 
print ("Second _try.-.is.e=65537_coprime_to_phi (%s)?"%n) 
print (gcd(phi ,e)==1) 


Second try - is e=65537 coprime to 
phi (674932867331573648976699887017)? 
True 


Once we have our key, away we go! 


code=mod(secret ,n)%e 
print ("My encoded_message_is_%s"%secret) 
print ("A_big. product _of primes bigger than_that is n=%s"%n) 
print ("And I.chose exponent _%s"%e) 
print ("The _encrypted_message_is_%s*%S.congruent to. 
%s"%( secret ,e, code) ) 


My encoded message is 68408084029415 

A big product of primes bigger than that is 
n=674932867331573648976699887017 

And I chose exponent 65537 

The encrypted message is 68408084029415%65537 congruent to 
114588857979006420962953343720 


Crack that! Who knows what ¢(n) is? 
But if I know it, I can calculate the inverse of e: 


f=mod(e,phi)*-1 

print ("My _original_primes._were_%s_andi%s"%(p,q)) 
print ("So _phi(n) =. (%s-1) (%s-1) .=.%s"%(p,q,phi)) 
print ("Which _makes f= %s"%f ) 

print ("And_the_decrypted_message._turns_out_to_be:") 
print(''.join(decode(code*f))) 


My original primes were 607345217933711 and 1111283743416647 

So phi(n) = (607345217933711-1) (1111283743416647-1) = 
674932867331571930347738536660 

Which makes f = 668815557671456976556345023213 

And the decrypted message turns out to be: 

MATHISCOOL 
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11.6 RSA and (Lack Of) Security 


We are now ready to discuss some elementary security issues regarding RSA. 
Remember, we aren’t learning to be security experts here, and far more pow- 
erful techniques are available! But these are some underlying fundamentals. 


Sage note 11.6.1 A final reminder to evaluate definitions. If you’re 
online, don’t forget to evaluate the commands in the Sage cell below so we can 
use words as messages instead of just numbers. 


def encode(s): # Input must be in quotes! 
s = str(s).upper() 
return sum((ord(s[i])-64)*26*i for i in range(len(s))) 


def decode(n): 
n = Integer (n) 
list = [] 
while n != @Q: 
if n%26==0: 
List. append (chr (64+26) ) 
n -= 1 
else: 
List. append (chr (n%26+64) ) 
n //=26 
return ''.join(list) 


11.6.1 Beating the man in the middle 


First, remember one problem with Diffie-Hellman key exchange (Section 11.4). 
Someone who can control your messages can actually fake them. This can’t 
happen with public-key systems (at least not as easily). Here’s why. 

Suppose I want to let someone verify I am who I say Iam. In a public-key 
system, I never need to let f get known, so I encode my signature with f itself 
as the exponent! 

First, I just turn my signature into a number. I'll just use the first three 
letters in order to keep the encoding small enough to use small primes. 


signature='Cri' 


code=encode(signature) 
print (code) 


6555 


Then I raise it to the power of the secret key f, the inverse of the public 
key e. 


p=89 

q=97 

n=pxq 
phi=euler_phi(n) 
e=71 

f=mod(e,phi)*-1 
secret=mod (code ,n)*f 
secret 


5422 
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Now anyone in the world can check my signature by raising this version of 
the signature to the public power e modulo n. 


print (secret%e) 
print (decode(secret“%e)) 


6555 
CRI 


The reason this works is because 
ef =1 (mod ¢(n)) 
and ef = fe in a commutative setting: 
(Name?) * - (Name)! = Name' = Name (mod n) 


Naturally, implementing this is somewhat more complex in real life (e.g. padding 
is used), but it is one major digital signing method implemented on many se- 
cure systems. 

Interestingly, this concept also can be used in the opposite way’. Suppose 
that someone sends a message using their public signature as above — a message 
which later turns out to implicate him or her in illegal activity, a scandal, 
offensive behavior, etc. The author may wish to repudiate this message, but 
(at least in principle) the digital signature cannot be repudiated in the same 
way as other types of messages. (Of course, one can always say that one’s 
private key was stolen, so it’s not foolproof!) 


11.6.2 A cautionary tale 


Lest you think we are now completely secure, let me warn you about one 
possible problem. Remember how we said above that it seems not to matter 
too much what e is? Well, that is sort of true, and sort of untrue. 

Suppose we chose to send a message using the following primes and ran- 
domly (maybe) chosen exponent e. (Notice that if gcd(e, é(pq)) 4 1, this code 
wouldn’t have worked at all.) 


message='hiphop' 

secret=encode (message) 

p=197108347 

q=591324977 

€=52665067560570823 

n=px*q 

phi=(p-1)*(q-1) 

code=mod(secret ,n)%e 

f=mod(e,phi)*-1 

print ("My encoded_message_is_%s"%secret) 

print ("A_big. product _of primes bigger _than_that iis. 
pq=(%s) (4s) =%s"%(p,q,n)) 

print ("(which_means.my secret. 
phi(n)=phi((%s) (%s))=(%s-1) (%s-1) Lisu%s)"%(p,q,p,q,phi)) 

print ("And_I_chose exponent _%s"%e) 

print ("The _encrypted_message_is_%s*%s_congruent_tow 
%s"%( (secret ,e, code) ) 

print ("The _inverse_of_%s_modulo_%s_isi%s"%(e, phi, f)) 

print ("And_the_decrypted_message._turns_out_to_be:") 

print(''.join(decode(code*’f))) 


“I am indebted to my colleague, Russ Tuck, for this observation. 
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My encoded message is 197108322 

A big product of primes bigger than that is 
pq=(197108347) (591324977) =116555088756283019 

(which means my secret 
phi (n)=phi ((197108347) (591324977) )=(197108347-1) (591324977-1) 
is 116555087967849696) 

And I chose exponent 52665067560570823 

The encrypted message is 197108322%52665067560570823 
congruent to 109598935674432155 

The inverse of 52665067560570823 modulo 116555087967849696 
is 103781564699780695 

And the decrypted message turns out to be: 

HIPHOP 


The above cell just does the RSA algorithm for a particular case, verifying 
it works. 

Now suppose Alice has sent Bob this message using Bob’s impressive RSA 
key (above) of 


(n, e) = (116555088756283019, 52665067560570823). 


Let me impersonate Eve, trying to snoop. On a hunch (or, as [E.2.3] puts it, 
after attending a seminar at a decryption conference), I figure I don’t have 
much to lose by just trying random arithmetic, so I decide to just keep taking 
eth powers of the encrypted text (which was already raised to the eth power 
once). 


trial_decrypt=code 

ror i an (lls. 25s 
trial_decrypt=trial_decrypt%“e 
print(''.join(decode(trial_decrypt))) 


UUQUIAHESLLQ 
IFTZCXXTCULDA 
HREHHYCUZMWQ 


DNBDDHIMUTSM 
HIPHOP 
CPTAXZGBUIVCA 


DNBDDHIMUT SM 
HIPHOP 
CPTAXZGBUIVCA 


What’s this? You should see a meaningful message appear. Eve would 
barely have to do anything to decrypt this! 


11.6.3 The explanation 


This circumstance may seem mysterious, but it really is related to mathematics 
we already used a number of times before. Remember that we could find an 
inverse for a modulo n by just taking powers of a, because 


a~t = a®™-! (mod n) 


Similarly, for any possible message m and public key e, there will always be 


some power k of e such 
ee _ 1 
m° =m (mod n) 
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which is the same as 
e* = 1 (mod ¢(n)) 


For this to happen, we would have to coincidentally have that not only 
gcd(e,n) = 1 (which we always pick), but also that gcd(e, d(n)) = 1. Then 
Euler’s Theorem 9.2.5 says that the order of e modulo ¢(n) is a divisor of (n), 
so we will sometimes find e where that order is a small divisor of ¢(n). 

Of course, in real life this would only happen randomly, so you could just 
protect against it by checking the order of e modulo ¢(n). Here’s how I created 
this not-quite-random example! 


g = 7 # Pick something coprime to n 
print(gcd(g,phi)) 

i = mod(g,phi) # look at it mod phi(n) 
print(i.multiplicative_order ()) 
print(factor(i.multiplicative_order ())) 


1 
4567854373940 
2°2 * 5 * 11 * 13 * 37 * 1879 * 22973 


j=i*¢ 11 * 13 * 37 * 1879 * 22973) # take it to as high a 

power I can to reduce the order 
print(j.multiplicative_order()) # make sure this is small 
print(gcd(j,phi)) # check we still have the right gcd 
print(j) 


20 
1 
52665067560570823 


What was the problem here? The issue is that we had an n such that its 
group of units had elements of tiny order in its group of units. (Two levels 
deep here!) 

More precisely, we had an n with a ¢(n) such that Uy(n) had elements of 
very small order in it, so that 


everysmallorder =| (mod o(n)) 


was possible. How can we avoid this? 


11.6.4 A solution 


When we found elements of big order (primitive roots, for prime modulus) in 
Chapter 10, we relied on having the original modulus p being prime. We did 
not tell the whole story, but we did do enough of what happens with other 
moduli to know that we should suspect that choosing n factoring as a small 
number of primes to powers should make it easy to find elements of big order 
in the group of units. (For instance, we saw that 2” had elements pretty close 
to being primitive roots.) 

And we do know something about ¢(n). Namely, since n = pg is the 
product of two primes, we know that 4(n) = (p — 1)(q—1) is also the product 
of two numbers. It would be too much to hope for those to be prime! After 
all, p— 1 and gq —1 will both be even, since p and q will be odd primes. 

However, it’s possible to pick p and q so that p— 1 = 2p’ and q— 1 = 2¢’, 
where p’ and q’ are both prime. In that case 


o(n) = (pq) = 6(p)b(q) = 2p'2q' = 4p" 
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so that ¢(n) at least is four times a product of (still big) prime numbers. 

We will not prove it, but it turns out this is enough to guarantee the 
existence of elements of orders p' — 1 and q’ — 1 in Ugpq), just like we had 
elements of order p— 1 in U,. To be precise, we get elements of order 

q-1 


ii 
fot? = tang Sie a 
e D os D 


if pot and at are both prime. Here is an example of this with very small p 


and q, where we at least have elements of order four. 


n = 7x*11 

phi = euler_phi(n) 

[mod(i,phi).multiplicative_order() for i in [1..phi] if 
gcd(i, phi) ==1] 


[1, 4, 2, 4, 4, 2, 4, 2, 2, 4, 2, 4, 4, 2, 4, 2] 


Going backwards, we are looking for prime numbers p’,q’ such that 2p’ + 
1,2q’ +1 are also prime, and then we use p = 2p’ +1 and q = 2q' + 1 in RSA, 
finding an exponent that has big order in Ugcn). In this example, p’ = 5 and 
q =3. 

Such primes p’ and q’ are called Germain primes, for French mathe- 
matician Sophie Germain. The primes p and q are then called safe primes, 
presumably because they might be ‘safe’ to use under some circumstances. 


Historical remark 11.6.2 Sophie Germain. Germain was the only female 
number theorist of note before the twentieth century, and is definitely an im- 
portant figure. She is most well-known today for proving cases of Fermat’s Last 
Theorem and (more importantly) developing a general strategy for attacking 
it for the first time. During Napoleon’s invasion of various German territories, 
she intervened to ensure Gauss’ safety, as she had corresponded with him under 
an assumed name for some time on this problem. Her significant work on an 
early problem in mathematical physics, while eventually winning an award, was 
largely ignored during her lifetime by the French mathematical establishment. 


Research into security of number-theoretic cryptography is ongoing. There are 
practical points as well; as just one example, one ePrint® discovered that 0.2% 
of a large set of public keys have “secret keys [which] are accessible to anyone 
who takes the trouble” to try to find them. Other studies have found even 
more — often because of poor randomness. 

Another interesting vulnerability is that there is a significant (in practice, 
not in theory) chance that two RSA keys will share a (prime) factor. In another 
study? it was found that not only did a nontrivial number of apparently un- 
related keys share a factor (enabling their complete factorization), many keys 
were the same! These would still be hard to factor, but as the authors says, 
“leliven cryptographic key sizes, we would not expect to see devices generate 
a single duplicated key for the population sizes we examined if the keys were 
generated with sufficient entropy.” This chapter is just a small taste of issues 
to consider, and no substitute for having a real security professional! 


11.7 Other applications 


The methods of Diffie-Hellman and RSA are just the most typical and famous 
encryption systems used in introductory number theory texts; there is a huge 


8eprint.iacr.org/2012/064 
9factorable.net/weakkeys12.extended. pdf 
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amount of active research into the mathematics of cryptography, much of which 
uses rather more advanced mathematics. The important point is that we have 
observed some of the basic issues to consider in such systems. 

A good next system to check out which has mathematics at the same level 
is the El-Gamal system (see Exercise 11.8.12). After reading Chapter 17 you 
may wish to explore the system mentioned in Subsection 17.5.3. For some- 
thing slightly more advanced, see the very brief discussion of elliptic curves in 
cryptography at the beginning of Subsection 11.5.1. 

There are also tons of other cryptographic applications which are not di- 
rectly about encryption. Two of my favorites are finding ways to flip a coin 
over the Internet and how to find out if someone makes more money than you 
without them revealing their actual salary. For now, we just share one secret. 


11.7.1 Secret sharing 


Suppose that a company with a particular trade secret has three employees 
with clearance to know details of this secret process. However, the company 
wants to avoid one of the three being bought off by a competitor and revealing 
it in an act of corporate espionage. 

The company needs to devise a system where, in order to actually gain 
access to the details of the trade secret, one needs two of the people involved. 
In a movie, you would have an impressive safe with three locks; each person 
would have a separate key to one of the locks, and the safe would be constructed 
so that any two of the keys would open it. 

But real cryptography is not the movies! For one thing, the data is proba- 
bly electronic, so it’s really something we need to do digitally. Cryptography 
provides the perfect way to deal with these issues. What we will do is indeed 


give each person a key — a digital encryption key, of course!”. 


Algorithm 11.7.1 Secret Sharing. Suppose the trade secret is digitally 
represented as a large number K. Here are steps to create three different keys 
so that access to any two of these will allow access to K. 


e¢ Choose some prime p> K. 
e Choose three numbers m, < m2 < m3 which are: 


o mutually coprime and coprime to p, t.e. gcd(mi,m;) = 1 and 
gcd(m;,p) = 1. 
o AND such that 
mim: > pms 


e Let M =m mz. 
e¢ Now choose some t < M/p at random. Then the keys are as follows: 
o We have a modified secret 
Ko =K-+tp 
o Person gets the key 


ky = Ko (mod mj) 


10The following description of this threshold scheme is a simplified exposition based on the 
one in the book where I first learned it, [E.2.4, Chapter 7.6]; see [E.4.21, Section 4.6] for a 
related scheme. (Wikipedia has decent links if you search for ‘secret sharing Chinese’) 
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Proof. What good do these do us? Well, the Chinese Remainder Theorem 
allows us to reconstruct Ko modulo m;m,; with any two keys kj and kj. That 
may not seem like a lot; that just gives us things to within multiples of m;m;. 
But by our choice of M = m m2 > pm3, we know that M/p > m3 (and hence 
M/p > m, as well). So 


Ko =K+tp<p+tp=(t+1)p<(M/p)p=M 


And certainly if Ko < M, then Ko < mj m,, since M is the smallest such 
product. So the Chinese Remainder Theorem allows us to reconstruct Ko 
uniquely, and then K = Ko — tp! 
Finally, note that just one person doesn’t have enough information to get Kk, 
since that just tells that 

Ko = k; (mod m,), 


so that 
for all 2 modulo m;. | 
Obviously, we’ll want to see this in action. 


Example 11.7.2 Suppose your secret was K = 5. Let’s pick p = 18, and 
numbers 17, 19, 16. 


K=5 
p=13 
m1 ,m2,m3=17,19,16 


We'll check quickly that mim > pms: 


m1*xm2>px*m3 


True 


So M = 17-19 = 323, and we can pick t = 12 more or less randomly as 
being less than M/p = 323/13 = 2033. 


M=m1*m2 

t=12 

print (M) 
print(M/p > t) 


323 
True 


So Ky = K +tp=5+12-13=161: 


K_@=K+txp 
print (K_0@) 


161 


This gives keys k;, which are Ky modulo m,;. Note that in our example, 
we can check all the conditions in the proof by hand, but with industrial-size 
numbers that would not be possible. 


k1,k2,k3 = mod(K_@,m1),mod(K_®@,m2) ,mod(K_@,m3) 
print(k1, k2, k3) 


8 9 1 
The three keys are now 8, 9,1 for moduli 17, 19, 16. 
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Now let’s actually reconstruct the secret K. First, let’s see that any two 
people do have enough information. We do the Chinese Remainder Theorem 
on each pair: 


# First Line: turn modular integers back into integers 
k1, k2, k3 = ZZ(k1), ZZ(k2), ZZ(k3) 

print (CRT (k1,k2,m1,m2)) 

print (CRT (k1,k3,m1,m3)) 

print (CRT (k2,k3,m2,m3)) 


Now we subtract tp from these outcomes. 


161-txp 


Great! 

One might suspect that a lone person, without one of the other secret 
sharers, might be able to just ‘guess’ which of the various solutions was right 
in this very small example. 


print([k1+ixm1 for i in [0..10]]) 
print([k2+ixm2 for i in [0..10]]) 
print ([k3+ixm3 for i in [0..10]]) 


[8, 25, 42, 59, 76, 93, 110, 127, 144, 161, 178] 

[9, 28, 47, 66, 85, 104, 123, 142, 161, 180, 199] 
[1, 17, 33, 49, 65, 81, 97, 113, 129, 145, 161] 

As you can see, without all the information it would not be so clear which 

is the correct Ko. If you get only one chance, you might not want to try to be 

lucky! 


As a note, we should point out that this secret sharing method doesn’t 
just protect against someone defecting. It also provides protection against one 
of the three becoming incapacitated somehow. If all three were necessary to 
unlock the secret, the company is one illness or death or resignation away from 
its secret being irretrievably lost without a system of this type. 

Finally, it is not terribly hard to extend this to a system that works by 
sharing a secret among n individuals in such a way that only k of them are 
needed to access the secret. For full details, I recommend [E.2.4, Chapter 7.6]; 
Example 11.7.2 was originally based on [E.2.4, Example 7.8). 


11.8 Exercises 


1. Do all the encryptions and/or encodings in Sections 11.1 and 11.2 ‘by 
hand’. 

2. Encrypt your name using an affine method (az + b) with key (5,6, 29) 
(don’t worry about letters), and decrypt BXHBI. 


3. Create your own az + b (mod n) system of encryption and bring an en- 
crypted message to class (or a friend also interested in number theory). 


4. Use the Diffie-Hellman method of encryption to encrypt a short (three to 
five character) message with a 26 < p < 50 ‘by hand’ (i.e. without Sage 
but with a calculator). Be prepared to explain your choice of e and p, and 
calculate that ef = 1 (mod p — 1) by hand. 
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5. Draw a diagram and show that if Eve has control of both communications 
in Diffie-Hellman key exchange (Algorithm 11.4.2), she can intercept and 
decrypt all messages. 


6. Do this two-parter using Diffie-Hellman modular exponentiation: 


e Suppose you discovered that the message 4363094, where p = 
7387543, actually represented the (numerical) message 2718. What 
steps might you take to try to discover e? 


e Suppose that you discovered in the previous part by hard work that 
e = 35. Now quickly decrypt the message 6618138. 


7. Pick two primes between 1000 and 2000 and create an RSA public key 
(n,e) for them. What is the decryption key f? Show your work. 


8. Suppose that n = 9211 and e = 539. 
¢ Encrypt a (short) message. 


e Find the decryption key f for this situation, and decrypt your mes- 
sage. 


e Use f to sign your name! 


9. Come up with your own RSA public-key system by choosing p and q 
and e as appropriate, but with n > 10000; then encrypt a short numerical 
message and hand in only the public key (n, e) and the encrypted message. 
(Your instructor’s job will be to crack it!) 


10. Construct a secret and share it in the way described in Algorithm 11.7.1. 


11. Learn about a symmetric key cryptosystem in common use. Do you own 
any devices which use it? 

12. Learn about the El-Gamal public key encryption method. How is it im- 
plemented? What mathematics used there is similar to what is used in 
this chapter? What is different? 


13. Learn about the Advanced Encryption Standard. How is the mathematics 
used there different from what is used in this chapter? 


14. Examine the code for encode and decode throughout, or have your instruc- 
tor explain it. If you were trying to encode real human communication, 
what improvements would you like to make to these? Could you imple- 
ment them, and how? 

15. In Example 11.7.2, explain mathematically the necessity of the Sage com- 
ment # First Line: turn modular integers back into integers just 
before the invocation of the Chinese Remainder Theorem with CRT. 


Summary: An Introduction to Cryptography 


A major application of number theory is ensuring privacy of many different 
types of communication. This chapter introduces the mathematics of cryptog- 
raphy at the level we have reached thus far. 


1. We begin with a brief, non-rigorous introduction to Encoding and decod- 
ing, as distinct from encryption. 


2. We then dive into a few mathematically elementary Encryption tech- 
niques which using congruence, keeping the mathematics as the main 
focus. 


3. A first method which helps motivate the mathematics of public-key meth- 
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ods is Diffie-Hellman Encryption. 


4. This is immediately used to show a real application: Diffie-Hellman key 
exchange. 


5. The next long section gives a lot of detail about the most famous public 
key method, the RSA encryption algorithm. 


6. In Section 11.6 we then examine some of the mathematical weaknesses 
of RSA, including the notions of Germain and safe primes. 


7. There are many other interesting topics in the practice of cryptography, 
but we only cover Secret Sharing for now. 


In the Exercises it is worth doing the ones where you create a small encryption 
and trying to have someone else break it. 
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Chapter 12 


Some Theory Behind Cryp- 
tography 


Cryptography is fun in and of itself. However, there are powerful theoretical 
issues at play throughout — as evidenced by the ever-increasing number of 
publications in this area. 

Certainly we can only touch on basic questions in this text, but the reader 
will be gratified to see how much variety there is even thus restricted. We pick 
two of the many theoretical questions to address. 


e How do we find all these big primes, anyway? 


e« How can we be sure it’s not so easy to break the codes — such as by 
factoring big numbers? 


12.1 Finding More Primes 


As we have seen, it is not terribly hard to find lots of small primes. One can 
use Sieve of Eratosthenes, or make numbers coprime to known primes and then 
factor them. 

The problem is that almost every effort to find lots of big primes has been 
stymied. Primes simply do not follow nice enough rules to enable easy detec- 
tion, despite the fact that they seem to follow very nice rules on average — a 
fact we will explore in later chapters. 


12.1.1 Fermat primes 


Here is an interesting historical example. Recall (Subsection 11.5.1) that our 
friend Pierre de Fermat thought that numbers of the form 2?” +1 would always 
be prime — numbers such as 5, 17, and 257. 


Definition 12.1.1 We call numbers of the form F,, = 2?) + 1 Fermat num- 
bers. ©) 

However, as we mentioned in Subsection 11.5.1, in 1732 Euler proved that 
F,, is not prime if n = 5. (See William Dunham’s [E.5.5] for an engaging take 
on the story.) Evaluate the following cell, which quickly produces numbers a 
bit long for print! 


185 
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for n in [Q..7]: 
pretty_print (html ("If u$n=%s$, then, 
$2°{ 24 n}4+1=2°{2%%s }4+1=%s$_ factors as, 
$%s$"%(n,n,2*(2*n)+1, factor (2% (2*n)+1)))) 


For example, 
22" + 1 = 59649589127497217 - 5704689200685129054721. 


Nobody knows if there are any more primes! in the sequence F,, past n = 4. 
Even the prime factors of elements of the sequence seem to be quite large; 
see for instance the end of Subsection 12.6.1 for Fg, or Subsection 17.5.2 for 
even more information. A very accessible article about the properties a prime 
divisor of a Fermat number is [E.7.43], where the authors prove directly that 
37 can never divide any Fy. 

There is a special test called Pépin’s test that tests Fermat numbers for 
primality. It is equivalent to checking whether 3 is a primitive root of 2?” +1. 
Proving it is just a little beyond us right now, so we will not address it yet; see 
Fact 17.5.1 for the statement and proof. 


12.1.2 Primes from Fermat numbers 


However, we can at least prove what seems obvious in the computation af- 
ter Definition 12.1.1 — namely, that lots of primes arise as factors of Fermat 
numbers, even when F,, isn’t itself prime. First, we need a lemma. 


Lemma 12.1.2 Suppose ¢ = jk is even, and k is an even factor. Then 2° —1 
factors as 


af 1 =97* 1 — (27 +1) (ce st) se (Ot) eS 1) 
Proof. Multiply and/or apply a little induction. (See Exercise 12.7.1.) | 
Example 12.1.3 For instance, 2° — 1 = 63 factors as 

2°? 1 = (28 +.1)(2° — 1) 


which corresponds to the factorization 9 - 7. 
Similarly, 212 — 1 = 4095 factors as 


2°4 —1= (27+ 1)(2° — 2° + 2 — 1) 


which corresponds to the factorization 9 - 455. 


Proposition 12.1.4 Fermat numbers are coprime. F,, = 2?” +1 and 
Fy, = 2?" +1 are coprime ifm #n. 

Proof. First, notice that any two Fermat numbers are very closely related to 
each other; if n < m, then F,, — 1 divides F,,, — 1. In fact, one is a power of 


the other: 
92” (2?") 


Because of this, using Lemma 12.1.2 with 7 = 2” and k = 2™~” (which is 
certainly even), we get 


- hn Vom P Si Ne ey a\l 
2? -1=(? +1) ((2") — (27°) +++ (2?") -1) 


See the witty article [E.7.24] for an argument that we shouldn’t expect many! 


gman 
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This implies the divisibility relationship 
Bat 2h ole 8 


so any number d that divides F;, also divides F,, — 2. Now we do a standard 
trick (see also Exercise 2.5.6). Combine all of the above facts to see that any 
divisor of F,, which also divides F,,, must divide F,, — (Fi, — 2) = 2, soa 
common divisor of F;,, and F;,, could only be two or one. 

But both Fermat numbers are odd, so the gcd must be 1. a 


12.1.3 Mersenne primes 
Another early attempt at finding big primes was an idea of Marin Mersenne. 


Historical remark 12.1.5 Marin Mersenne. Mersenne was a Minim 
monk who not only acted as a clearinghouse for scientific knowledge in early 
17th century France (particularly between Pascal”, Fermat*, Descartes*, Rober- 
val°, and their friends) but also wrote major theological and music theoretical 
treatises of his own. See Figure 19.4.12. 

Mersenne suggested® that one try searching for primes of the form 2? — 1, 
where p is itself prime. 


Definition 12.1.6 In general, numbers of the form M, = 2” — 1 are called 

Mersenne numbers. If they are prime, they are called Mersenne primes. 

v 

Using a variant of Lemma 12.1.2 (see Exercise 12.7.2), it is not too hard to 

prove that if n is composite then M,, is too; see Exercise 12.7.7. Further, not 

every M, for prime p is prime either; evaluate the following Sage cell to verify 
this. 


for p in prime_range(100): 
pretty_print (html ("Ifu$p=%s$, .then_$2*p-1=2*%{%s }-1=%s$_ 
factors as $%s$"%(p,p,2*%p-1, factor (2*p-1)))) 


Certainly the computation above doesn’t always give primes (recall for 
instance the discussion at the end of Subsection 7.5.2), but it’s not a bad 
source. 


2www-history.mcs.st-andrews.ac.uk/Biographies/Pascal .html 

3www-history.mcs.st-andrews.ac.uk/Biographies/Fermat. html 

4www-history.mcs.st-andrews.ac.uk/Biographies/Descartes. html 

5www-history.mcs.st-andrews.ac.uk/Biographies/Roberval.html 

6For more on the precise nature of his suggestion, its provenance, and the ‘rule’ by which 
he seems to have tried to decide which of these numbers should be considered, see Still- 
man Drake’s article The rule behind ‘Mersenne’s numbers’ in Physis Volume 13, Number 
4, and Vittorio Boria’s dissertation, Marin Mersenne: Educator of scientists (available at 
https: //dra. american. edu). 
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Historical remark 12.1.7 GIMPS. You can help the world search for more 
Mersenne primes if you leave your personal computer on and connected to the 
Internet, via the Great Internet Mersenne Prime Search’ (GIMPS). Random 
computers in labs at the University of Central Missouri and UCLA have found 
some of the largest known primes this way. 

The most recent one (as of this writing in May 2023) was found in December 
20188! The largest known such primes are very large; this one has nearly twenty- 
five million digits, and the folks at Numberphile made a very amusing video? 
unwrapping a book containing a previous record holder of ‘only’ twenty-two 
million digits. GIMPS even won a monetary prize for finding these huge primes; 
they shared it with many of the people who made it possible. 


Historical remark 12.1.8 The Skylake bug. These primes are far too 
large, and are not common enough, to use for most serious applications’?, but 
nonetheless they help us investigate ideas about primes. A less obvious but 
interesting application is that searching for very large primes can also help 
more mundane hardware testing. 

A good example of this is that computing the GIMPS program uncovered 
a bug in a major Intel chip!?. Number theory can push our hardware (and 
software!) beyond our imagination. (See also Historical remark 22.3.9.) 

Implementing a program like this on normal computers is conceivable is 
because of a special test which applies just to numbers of the form 2? — 1. 


Algorithm 12.1.9 Lucas-Lehmer test. Let x) = 4 and let p be prime 
(greater than 2). To test whether 2? — 1 is prime, create the list of numbers 


In41 = residue of x — 2 modulo 2? — 1 


Do this p—2 times; if the result xp_2 is divisible by 2? —1 (i.e., is zero modulo 
2? —1), then 2? —1 is in fact prime. 


Example 12.1.10 With p = 5 and 2? — 1 = 31, we would start with xp = 4; 
doing it 5 -— 2 = 83 times gives: 


1. 42 -2=14 modulo 31 is 14 
2. 142 2 = 194 modulo 31 is 8 
3. 87 — 2 = 62 modulo 31 is 0 


And of course 31 is indeed prime. 


You can try the test, naively implemented in Sage, in the following cell. 


@interact 
def _(p=(71,prime_range(100))): 
test = 4 
num = 2%p-1 
for i in range(p-2): 
test=(test*2-2)%num 
pretty_print (html ("The _test_says_"+str (bool (test==0)))) 
pretty_print (html ("And in ifact_$2*%{%s}-1=%s$_primality. 
is."%(p,num)+str(is_prime (num) ))) 


7www.mersenne.org 
8www.mersenne. org/primes/press/M82589933.htmL 
®www. youtube. com/watch?v=tlpYjrbujG 
10Though see United States patent 6307935, which explicitly uses them to directly encrypt 
onto a special elliptic curve. 
llarstechnica.com/gadgets/2016/01/intel-skylake-bug-causes-pcs-to-freeze-during-complex-workloads/ 
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Proving Algorithm 12.1.9 is slightly beyond our capabilities in this text. 


12.1.4 Primes from Mersenne numbers 


We can prove the lesser result that Mersenne numbers are coprime, which (just 
as with the Fermat numbers) can give us a lot of interesting prime factors. 


Proposition 12.1.11 Mersenne numbers are coprime. Mersenne num- 
bers 2? —1 and 24 —1 with coprime exponents are themselves coprime. 

Proof. By way of contradiction, let d > 1 be the gcd of the two numbers 2? — 1 
and 2% — 1. Let’s investigate the order of 2 # 1 in Ug. (Before reading more, 
think about why 2 is even in this group.) 

By definition of divisibility, 


2? = 1 (mod d) and 2% = 1 (mod d) 


By group theory (use Theorem 8.3.12) we know that 2 = 1 means that k is a 
multiple of the order |2| of the element 2. Thus p and q both are multiples of 
|2|. 
Since p and q are coprime, though, the only possibility for |2| is that |2| = 1. 
This is a contradiction, so our assumption that d > 1 was wrong. a 
See this linked video featuring Holly Krieger, by Numberphile!? for an 
interesting take on this. Namely, all Mersenne numbers after 2° — 1 (even the 
ones where p is not prime!) have a new prime divisor. 


12.2 Primes — Probably 


Primality testing is full of little tidbits like those in the previous section, and 
tantalizingly devoid of easy methods that work for all special cases. Indeed, 
none of these paths lead us to reliable, reasonably fast discovery of large primes 
for cryptographic purposes, nor do other computationally infeasible methods 
like using Wilson’s Theorem or other even stranger formulas (some of which 
appear later in this text). 

Instead, what is typically done is to pick a number, and then use tests on 
it that do not guarantee primality! 

Why would this work? The idea is that if a given number passes enough 
tests that do not guarantee primality but have a quite low false positive rate 
in practice, then the probability the number you have is composite is lower 
than the (very low) chance that your computer made an arithmetic error due 
to cosmic rays (though one still has to be careful of bugs like the one described 
in the discussion before Algorithm 12.1.9). 

This is astonishing, but true. Then if you end up with a number that likely 
to be prime, you can always confirm its primality with one of the various slower 
tests I will not describe. 


12.2.1 Pseudoprimes 


We start this discussion with our visual representation of powers (see Subsec- 
tion 8.2.1). 


12www. youtube. com/watch?v=09JsLnY7W_k 


CHAPTER 12. SOME THEORY BEHIND CRYPTOGRAPHY 190 
0 2 4 6 8 10 


Figure 12.2.1 Colored table of powers modulo n = 11 


wo 


ul 
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Notice again here that Fermat’s Little Theorem is visible in the second-to- 
last column. The graphic has been expanded, so that the last column is a slight 
restatement thereof, true for all a: 


a? = a (mod p). 


(See Exercise 12.7.3 and Exercise 9.6.3.) Go ahead and confirm it in the inter- 
active version. 


import matplotlib.pyplot as plt 
from matplotlib.ticker import IndexLocator, FuncFormatter 
@interact 
def power_table_plot (p=(11,prime_range(100)[2:])): 
mycmap = plt.get_cmap('gist_earth',p-1) 
myloc = IndexLocator(floor(p/5) ,.5) 
myform = FuncFormatter(lambda x,y: int(xt+1)) 
cbaropts = { 'ticks':myloc, 'drawedges':True, 
"boundaries ':srange(.5,p+.5,1)} 
P=matrix_plot(matrix(p-1,[mod(a,p)*b for a in range(1,p) 
for b in srange(pt+1)]),cmap=mycmap, colorbar=True, 
colorbar_options=cbaropts, ticks=[myloc,myloc], 
tick_formatter=[None,myform]) 
show(P, figsize=6) 


This is a useful criterion, as it works for all input, including multiples of 
the modulus. We can now use it to state a test for possible primality: 


Fact 12.2.2 If there is an a such that a” #4 a (mod n), then n must be 
composite. 


So if a” =a (mod n) for a given n, it’s at least possible that n is prime. 


Definition 12.2.3 If a” =a (mod n), we say n passes the base a test. 


It turns out that everyone from the ancient Chinese to Leibniz used this 
test for the base a = 2 to assert numbers are prime. And it doesn’t do a 
bad job. As some former students pointed out, it’s sort of like internet date 
matching for primes; it doesn’t always work but can succeed reasonably often. 
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@interact 
def _(n=100): 
pretty_print (html ("Here_are_the_numbers._ through. $%s$__ 
that _pass_the_base.2_test"%n)) 
pretty_print(html("along_with _whether._they_are_actually. 
prime")) 
for i in [2..n]: 
if mod(2*i,i)==2: 
pretty_print (html(r"$2*%{%s}\equivi2\text{_(mod_ 
}%s)$_.and_the_primality_of _ $%s$_is_ 
4s"%(i,i,i1,is_prime(i)))) 


We can change the numbers in the range of the preceding interact to check 
for more — say up to 1000, which allows exploring the following question. 


Question 12.2.4 Are there any numbers which satisfy the base a test and are 
not prime? 


To the surprise of many in the world of numbers, the answer is yes. The 
numbers n = 341, n = 561, and n = 645 turn out to fall in that category (for 
base a = 2). 


print ("We_factor_341_and_get_%s"%factor (341) ) 
print ("We_factor_561_and_get_%s"%factor (561) ) 
print ("We_factor_645_and_get_%s"%factor (645) ) 


We factor 341 and get 11 * 31 
We factor 561 and get 3 * 11 * 17 
We factor 645 and get 3 * 5 * 43 


That’s still not bad — out of one hundred seventy-one total such potential 
primes base 2, only three of them actually are not prime, or about one and 
three quarters percent. That is unusual enough that we have a special name 
for composite numbers that pass one of the base a tests. 


Definition 12.2.5 Pseudoprimes. If a” = a (mod n) but n is not prime, 
we say it is a pseudoprime base a. © 

That is to say, if a number satisfies Fermat’s Little Theorem, we think it is 
likely enough to be prime to call it a pseudoprime if it isn’t. (Prime, that is.) 


Remark 12.2.6 We will loosely follow a somewhat standard convention, 
particularly since we’re talking about finding primes, and only consider odd 
pseudoprimes. In fact, according to an article by some experts in pseudo- 
primes [E.7.32], the first even pseudoprime to the base 2 (161038 = 2-73-1103) 
was only discovered in 1950. See also Exercise 12.7.16. 

Perhaps unfortunately to cryptographers (though interestingly to pure math- 
ematicians!), it turns out that there are infinitely many such pseudoprimes. 


Fact 12.2.7 If n is (an odd) pseudoprime (base 2), then so is 2" —1. 


We will get this result as a corollary of something stronger soon (see Corol- 
lary 12.4.3 and Theorem 12.4.2). 

All the Fermat and Mersenne numbers pass the base 2 test, incidentally, 
though they are all quite large compared to a typical number you might try. 


12.2.2 Prime impostors, and how to avoid them 


If we want to check things out more carefully, we can try to test for primality 
with a different base. In the next cell, we choose a = 3. 
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for n in [341,561,645]: 
pretty_print (html (r"$3%{%s}\equivi%s\text{_ (mod. 
}%s)"%(n,mod(3,n)*n,n))) 


As you can see, this exposes 341 and 645 as fakes. What about 561? Let’s 
try that one with base a = 5 as well. 


@interact 
def _(p=(5,prime_range(5Q))): 
for pr in prime_range(next_prime(p)): 
pretty_print (html (r"$%s*{561}\equivi%s\text{_ (mod. 
3561) "%(pr,mod(pr , 561) *561))) 


Hmm, that’s interesting. What if I change to a different prime base, like 
a=7or 11? Try it above. 

In the next cell, I get systematic. We should expect output if 561 doesn’t 
pass the base a test for some a. 


@interact 
def _(p=(5,prime_range (1000) )): 
pretty_print(html("The primes _up ito $%s$_ for which. $561$_ 
fails the _base_$p$ test: "%p)) 
for pr in prime_range(next_prime(p)): 
if mod(pr ,561)%*561!=pr: 
pretty_print (html (r"$%s*{561}\equiv_%s\\text{_ 
(mod_}561)$"% (pr ,mod(pr ,561) *561))) 


It appears that p°°! = p mod 561 for every prime p! Let’s prove it. 


Fact 12.2.8 The number 561 is a pseudoprime for every integer base a. 
Proof. We know that 
561 = 3-11-17, 


so by Fact 7.2.2 (and, ultimately, the Chinese Remainder Theorem) 
a! = a (mod 561) 


if and only if a°®! = a holds for the prime power factors 3, 11,17; so we will 
check them. 
Remember, the exponents for these congruences live in the (mod ¢(p)) world, 


so we just need to check what 561 is in each of those worlds. We get: 
¢ 561 =1 (mod 16 = 17-1) so a®°! = a! (mod 17) 
¢ 561 =1 (mod 2=3-1) so a®®! =a! (mod 3) 
¢ 561 =1 (mod 10 = 11-1) so a®°! = a! (mod 11) 

That is, for p = 3,11,17 we see 


a°®' = a' (mod p) 

Using Proposition 5.4.5, this congruence is always true! 

By the way, we note that a°©° is not congruent to 1, which explains why we 
use a” = a for these definitions. | 
Definition 12.2.9 We call a number which is pseudoprime to every base a, 
but is not a prime number a Carmichael number, in honor of the first person 
to actually produce such numbers, Robert Carmichael (in 1912). © 
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So is 561 a Carmichael number? We saw the factorization above, but here 
it is again: 


factor (561) 


3. * 11 * 17 


The proof of Fact 12.2.8 suggests that to find a Carmichael number n, we 
might want to look at n which are a product of primes p; such that n -1=1 
in the exponent world of p;. It turns out that this is true, and one can prove 
something even more specific. 


Proposition 12.2.10 Korselt’s Theorem. Carmichael numbers are pre- 
cisely those composite n for which n is a product of at least two distinct primes 
p; (no squares) 

N = piprp3-** Pr with pi # pj 


such that 
pp—-1|n—-1 


for all the prime factors. 

Proof. Prime numbers satisfy almost all the conditions trivially. To show that 
561 is a Carmichael number we used this idea in the form n = 1 (mod ¢(p;)) 
for all three prime factors, and essentially the same argument applied to any 
number satisfying the hypotheses is a Carmichael number. 

We will not prove the other half of this theorem (that all Carmichael numbers 
have this form). It is not hard, however, using a slight variant on the Euler ¢ 
function one can acquire from investigating U,, for composite n. a 


Example 12.2.11 Evaluate this Sage cell to see the previous result applied to 
identify another Carmichael number. 


n=29341 
pretty_print (html ("$%s$_is composite with. factorization. 
$%s$, but"%(n, factor (n)))) 
for fact,pow in factor(n): 
pretty_print(html(r"$%s*{%s }\ equiv_%s\text{. (mod. 
}%s)$"%(fact ,n,mod(fact ,n)*n,n))) 
pretty_print (html ("and")) 
for fact,pow in factor(n): 
pretty_print (html (r"$%s\equivi%s\text{_ (mod. 
}\ phi (%s)=%s)$"%(n, mod(n, euler_phi(fact)), fact, 
euler_phi(fact)))) 


12.3 Another Primality Test 


For a long time it was open whether we might be lucky and show there are 
only finitely many Carmichael numbers. However, as was proved in the mid- 
nineties!’, there are infinitely many Carmichael numbers. 

So now what? Can we find other ways to reliably get primes? 


12.3.1 Another pattern 


To answer this, we turn to another result visible in our modular power graphic. 


13www.math. dartmouth. edu/~carlp/PDF/paper95. pdf 
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Figure 12.3.1 Colored table of powers modulo n = 11 


As usual, Fermat’s Little Theorem is the right-hand column. What’s that 
pattern in the middle column? Can you confirm it in the interactive version? 


import matplotlib.pyplot as plt 
from matplotlib.ticker import IndexLocator, FuncFormatter 
@interact 
def power_table_plot (p=(11,prime_range(100)[2:])): 
mycmap = plt.get_cmap('gist_earth',p-1) 
myloc = IndexLocator(floor(p/5) ,.5) 
myform = FuncFormatter(lambda x,y: int(xt+1)) 
cbaropts = { 'ticks':myloc, 'drawedges':True, 
"boundaries ':srange(.5,p+.5,1)} 
P=matrix_plot(matrix(p-1,[mod(a,p)*b for a in range(1,p) 
for b in srange(p)]),cmap=mycmap, colorbar=True, 
colorbar_options=cbaropts, ticks=[myloc,myloc], 
tick_formatter=[None,myform]) 
show(P, figsize=6) 


Theorem 12.3.2 The Square Root of Fermat’s Little Theorem. 


For any odd prime modulus p{ a, we have a—-D/2 = 41 (mod p). 
Proof. Since a?—! = 1 we know that a~/? is a solution to 2? = 1. (Note 
that p is odd so (p — 1)/2 makes sense.) 

As in Section 7.3, we can rewrite and factor the congruence x? = 1 as p | 


x? —1=(x+1)(2—1). Given that p is an odd prime, that means p | x — 1 or 
p\|at+l. 

Then x = +1 (mod p). (This is restated in Subsection 16.1.1.) Since a'?—)/? 
is one such solution, then a‘~!)/? = +1 (mod p). a 


What is the use for us of this theorem? Think similarly to the pseudoprime 
situation. Imagine we are testing some number n for primality, but we then 
find that 

oe +1 (mod n), 


then that number is definitely not prime. 
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Let’s try this on our pesky Carmichael number, once again starting with 
base a = 2. (Remember that we already know 2°°!~! = 1 since 561 is a 
pseudoprime. ) 


mod (2,561) *((561-1)/2) 


Not again! Try another base — maybe a = 3? 


mod (3,561) *((561-1)/2) 


441 


Phew, this works, as 306!-))/? 4 +1 (and 561 is not prime). So this 
criterion does help us test at least a little better. 


12.3.2 Miller’s test 


A slightly stronger variant of this test is called Miller’s test base a for 
primality, after American computer scientist Gary Miller. 


Algorithm 12.3.3 Miller’s test for base a. We will proceed by repeatedly 
dividing and then checking a congruence. 


e Begin with n —1; divide it by two, and then check the power 


a®—Y/2 (mod n). 


If the result is —1 we say n passes Miller’s test. If the result is not +1, 
we say it fails Miller’s test (since if n is prime, the result would certainly 
be +1). If the result is +1, we continue. 


e If we have arrived at a point where we can no longer divide n —1 by 
two, we say n passes Miller’s test. Otherwise, assuming a\—))/? = 1, we 
continue by dividing the power itself by two and then taking a to that new 
power. Once again, if the result is —1 we say n passes the test, and if it 
is not +1, we say it fails. 


e If the result is +1 and we can continue dividing the power by two, do so 
and check the result, as often as need be. If we arrive at the point where 
we have divided n — 1 by all possible powers of two and the result is still 

+1, then we say n passes the test. 


Example 12.3.4 Let’s see a few examples of this. First, the number 1387 is 
a pseudoprime base 2 — but it does not pass Miller’s test, which is good since 
it’s composite. Try the following cell to see exactly what happens. 


n=1387 

pretty_print (html ("We_know_$%s$_is.composite_because it. 
factors _as_$%s$"%(n, factor(n)))) 

pretty_print (html ("Let 's.check_$2*{(%s-1)/2}$_.modulo_$%s$:_ 
it's .$%s$"%(n,n,mod(2,n)*((n-1)/2)))) 


Looking good ... But let’s try another pseudoprime number (the Mersenne 
number /j;, in fact) to see if it passes, just to be sure. 
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n=2047 

pretty_print (html ("We_know_$%s$_is.composite.because it. 
factors _as_$%s$"%(n, factor(n)))) 

pretty_print (html ("Let 's.check_$2*{(%s-1)/2}$_.modulo_$%s$:_ 
it's $%s$"%(n,n,mod(2,n)*((n-1)/2)))) 


As we can see, this shows that n = 2047 passes the first part of Miller’s test 
base 2, and that there is no further to go because (2047 — 1)/2 = 1023 is odd. 
So, as far as we know thus far, 2047 is prime (though actually it is the lowest 
Mersenne number with prime exponent not to be prime). 

Let’s try Algorithm 12.3.3 with another number, 1009. 


n=1009 

pretty_print (html ("We_know_$%s$_is_prime_because_it factors, 
as $%s$"%(n, factor(n)))) 

pretty_print (html ("Let's check $2*{(%s-1)/2}$_modulo_$%s$:. 
it's $%s$"%(n,n,mod(2,n)*((n-1)/2)))) 

pretty_print (html ("Let 's check. $2*{(%s-1)/2/2}$_modulo_$%s$:_ 
it's $%s$"%(n,n,mod(2,n)*((n-1)/2/2)))) 


This passes Miller’s test the first time, but the algorithm keeps going since 
our first computation was = 1. The second time we got = —1, so we stop and 
hope the number is prime. (It is, in this case!) 


12.4 Strong Pseudoprimes 


Since composite numbers can pass Miller’s test too, nomenclature can get 
frustrating if we don’t organize. So we come up with another name. 


Definition 12.4.1 We call a composite number n that passes Miller’s test base 
aa strong pseudoprime base a. © 

The bad news is that strong pseudoprimes exist, as we saw above with 
n = 2047. In fact, we can prove a theorem about them analogous to Fact 12.2.7, 
and which implies it (see Corollary 12.4.3). 


Theorem 12.4.2 [fn is a pseudoprime base 2, then 2” —1 is a strong pseudo- 
prime base 2. 
Proof. As per our convention, let n be composite and odd, but it passes the 
base two test: 

2” = 2 (mod n). 


Since n is odd, we can cancel 2 in the congruence, and get 
g°-1 =1 (mod n). 


Rewrite this as 2”~' — 1 = nk for some integer k. 
Since 2”—! — 1 is odd, then so is k necessarily. Now comes some final manipu- 
lation to prepare to apply Miller’s test to 2” — 1: 


(2" —1) -1=2"—2=2(2""' —1) = 2nk. 


Now use the preceding equation as the exponent in Miller’s test and a clever 
reduction: 


ple Ae 920k]? gn — (9P)” = 4 = Gua 3" 1) 


Since [(2” — 1) — 1]/2 = 2-1 — 1 is odd, the number passes Miller’s test. 
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All that remains is to show 2” — 1 is composite if n is composite; this is a fairly 
straightforward extension of Lemma 12.1.2 (see Exercise 12.7.7). B 


Corollary 12.4.3 If n is a pseudoprime base 2, so is 2” —1. (This is 
Fact 12.2.7.) 


Proof. All we need is that (+1)? = 1. | 


Corollary 12.4.4 There are infinitely many strong pseudoprimes (and hence 
pseudoprimes) base 2. 

Proof. Take your favorite pseudoprime, and keep subtracting one from two to 
the power of the previous (strong) pseudoprime. | 


Example 12.4.5 For instance, we now know that 2°41 — 1 must fall in that 
category. If you try the cell below you will see that the (very large) second 
number is odd, which confirms it. 


n=2%341-1 
print (mod(2,n)*((n-1)/2)) 
print ((n-1)/2) 


But there are not any ‘strong Carmichael numbers’! In fact: 


Theorem 12.4.6 If n is an odd composite positive integer, then n passes 
Miller’s test for at most (n — 1)/4 bases a between 1 and n — 1. 

Although the proof is accessible to us at this point, we will not provide it 
for the sake of space. It counts numbers of solutions of xf — 1 modulo various 
prime powers and combines them with the Chinese Remainder Theorem to 
give a good counting argument. 

Needless to say, no one could use the base a test for enough bases to prove 
primality for any realistic n! But Michael Rabin used this fact to suggest a test 


for a probable prime with probability of failure less than (ay" for any desired 
k. 


Algorithm 12.4.7 Miller-Rabin (probabilistic) primality test. Run 
Miller’s test for k different bases less than n—1. If a number passes all of 
them, the probability of failure is less than Cw 


For 100 bases, this is the probability that would come out. 


(1./4) *100 


6.22301527786114e-61 


So if you run the test for 100 bases, you are in pretty decent shape. 

You can also always use some slow test to prove primality. That is what 
is called a certificate of primality, and although you may not believe it, 
programs that reliably generate reasonably large (100-200 digits, right now) 
primes and can verify it are hot items on the virtual shelves of those who care 
about such things. 

Finally, let’s see this in action. Remember that we wanted keys larger than 
1024 bits for at least a semblance of security in RSA? Here we go with a start: 


p=next_probable_prime (randrange (2%1024) ) 
q=next_probable_prime (randrange (2%1024) ) 
n=pxq 

pretty_print (html (p)) 
pretty_print (html (q) ) 
pretty_print (html (n)) 
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The p and q we get above are just probable primes. Verifying them could 
take a little longer! Here, we try it with just one of them. 


p=next_probable_prime(randrange (2%1024) ) 
%time is_prime(p) 


CPU times: user 1.35 s, sys: ® ns, total: 1.35 s 
Wall time: 1.35 s 


True 


Sage note 12.4.8 Reminder about timing. Don’t forget, you could use 
%time is_prime(p) to time this operation in a worksheet or Sage command 
line. 


12.5 Introduction to Factorization 


Let’s take a last crack at issues directly related to cryptography. (That doesn’t 
mean that other stuff we do in this text is unrelated — oh no! Especially the 
geometry is connected. But we will not make direct connections.) 

We will focus on the main attack on the RSA algorithm, namely finding 
nontrivial factorizations, or factorization. 


12.5.1 Factorization and the RSA 


Let’s look at another toy RSA problem to get a sense of what is going on. First, 
I choose a modulus n = 899. I will also use Sage to verify it has two prime 
factors, without telling you what they are. 


n=899 

print ("There _are_%s_prime_factors and_their powers are %s. 
andi%s."%(len(n.factor()), n.factor()L@J[1], 
n.factor()£1J[1])) 


There are 2 prime factors and their powers are 1 and 1. 


Then I choose an exponent to raise my secret message by ... 


e=13 
print ("We _choose_n=%s_and_exponent.e=%s, .and_verify that. 
gcd(e, phi(n))=1:.%s"%(n,e, 1==gced(e, euler_phi(n)))) 


We choose n=899 and exponent e=13, and verify that 
gcd(e,phi(n))=1: True 


I haven’t told you ¢(n), but this guarantees it is coprime to my (public) 
encryption key, which I have chosen to be e = 13. Now we can encode our 
message, x = 11. 


x=11 
message=mod(x,n)*e 
message 


21 


Now, how could we hope to crack this sinister message? (Assume that Sage 
doesn’t have enough power to compute euler_phi(899) directly.) Well, we do 
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know n = 899 and that e = 13. That could help. Remember, if we knew p 
and q, we could easily calculate ¢(n) without even using Sage, which should 
be enough. 

Question 12.5.1 Can you quickly now factor n = 899 without using Sage? 


Solution. Hint: be smart about it. Think strategically; how should I have 
chosen a public modulus n to make this hard to do? How should p and gq relate? 


Hopefully you figured out p and q. Then we just need to find an inverse 
modulo ¢(n) = (p — 1)(q— 1) to get our decryption key. 


Sage note 12.5.2 Trying your primes yourself. You can fill in the values 
you got for p and q here to make things work. Try it! 


inverse_mod(e, (p-1)*(q-1)) 


ht hOoT 
oul 


When we decrypt, we should get the original message 7 = 11 again. 


message*f 


This simple example makes it clear why factorization, not just looking for 
primes, might be important. To be truthful, many researchers in factorization 
simply do it to stay one step ahead of the other side, who is presumably also 
researching factorization — so to some extent it is an arms race. 

But factorization is also inherently interesting mathematically! Here is an 
interesting statement, as an example. 


Fact 12.5.3 If I know ¢(n) and n, and know that n is a product of exactly two 
distinct primes, I can easily compute them both. 
Proof. Of course, if we know ¢(n), we already can crack the code, but who 
cares; maybe we are given ¢(n) and n and want the factorization. Here is the 
short proof. 
Suppose the (as yet unknown) primes are p and g. Then expand our formula 
to 

o(n) = (p-1)(q-1)=pq-p-—qt+l=n-(p+q)t+1 


We now can represent both p+ q and pq as formulas in n and ¢(n): 
* ptg=n—¢(n)+1 
* pqg=n 
Where might we have a formula with p+q and pq? That should seem familiar 


(2 — p)(@-—9q) =a? —(p+q)a+pq 


So we can simply use the quadratic formula on this expression to get the values 
for p and q! 


_ (pt gQtV0t+9)? —4p9__ n- O(n) +1, S(n—- (mn) +1)? —4n 
= 2 - 2 = 9 
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Example 12.5.4 Continuing the example above, 


x? — (899 — 840 + 1)a + 899 = x — 60x + 899 = 0 


+ ,/602 —4(1 V3600 — 3596 
60 + y/60? — 4(1)(899) _ 4, , v3600— 3596 _ 
2(1) 2 


+1 = 29,31. 


12.5.2 Trial division 


The first, and oldest, method of factoring is one you already know, and maybe 
used a few minutes ago — trial factorization, or trial division. It is the 
method we used with the Sieve of Eratosthenes; you just try each prime number, 
one by one. 

In Algorithm 6.2.3, do you remember what the highest number you would 
have to try is in order to factor a given n by trial division? (Can you prove 
it?) 

The following algorithm does this very naively (and slowly, even for trial 
division). Let’s try to talk through what each step does. 


Sage note 12.5.5 Code for trial division. This is one of the few places 
where it really is important to follow the code. That said, the details of the 
syntax are not as important as the algorithm — unless you want to harness the 
power of computers more effectively! 


def TrialDivFactor(n): # We define the function 
p = next_prime(1) # We start off by testing the 
next prime after 1 
top = ceil(math.sqrt(n)) # This was proved to be the 
biggest number we need 


while p < top: # As long as the prime is less 
than that bound, we keep going 
if mod(n,p)==0: # In this case, p divides n and 
we're done! 
break # This is Python's way of saying 


we are done searching 
p=next_prime(p) # Otherwise, we try the next 
prime until we're done looking 


if n==1: # We probably could have checked 
for this right away 
print("1 is not prime") # Well, 1 is not a prime! 
elif p==n: # If we get all the way through 


and end with a prime... 
print(n,"is prime") # Then our number was prime 
elif mod(n,p)==0: # But otherwise... (!) 
print(n,"factors_as",p,"times",n/p) # We have a 
factorization! 
else: # And finally... 
print(n, "is prime") # We must have gotten 
Lucky. 


Algorithm 12.5.6 Trial Factorization. To factor n, first enumerate the 
primes in ascending order p,,p2,--- Pr, where px, is the largest prime less than 
or equal to \/n. For each prime in order, check whether p; | n. If it does, 
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terminate by returning p; and n/p;; otherwise n must be prime. 
Now let me verify it works on easy examples. Remember, we are just 
looking for factors at this point, not complete factorizations. 


for z in range(1,18): 
TrialDivFactor(z) 


is not prime 

is prime 

is prime 

factors as 2 times 2 


RwWN 


17 is prime 
Okay, so this seems reasonable. But it’s a little more problematic when you 
try to do large numbers, where large means “bigger than you can do by hand, 
but nowhere close to the size we looked at in general.” I’ll actually time'+ how 
long it takes. 


TrialDivFactor (6739815371) 
timeit('TrialDivFactor (6739815371) ') 


6739815371 factors as 13099 times 514529 
5 loops, best of 3: 76 ms per loop 


Sage actually implements this in a much faster way, primarily by using 
optimized integers and a special version of Python that allows turning it into 
muchfaster code in the C language (Cython). Notice that the command returns 
just a single factor — giving another slight speedup. 


print (6739815371. trial_division()) 
timeit ('6739815371.trial_division() ') 


13099 
625 Loops, best of 3: 43 Os per loop 


That’s roughly one thousand times faster for the initial example! Naturally, 
it’s possible to speed up even more. Sometimes getting the full factorization 
slows us back down; after all, one has to check that the remaining factor is 
prime (or factor it, if it isn’t), so checking this is worth it too. 


print (6739815371. factor()) 
timeit ('6739815371. factor()') 


13099 * 514529 
625 Loops, best of 3: 23 Os per loop 


Even for the following smaller number it takes some actual time — here is 
where one sees the difference between different implementations of the same 
algorithm. 


timeit('TrialDivFactor (997*991) ') 


125 loops, best of 3: 1.63 ms per loop 


timeit('(997*991).trial_division() ') 


625 Loops, best of 3: 3.26 us per Loop 


14Unfortunately Sage interacts do not currently support using timeit in an interact prop- 
erly. 
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timeit('(997*991).factor()') 


625 Loops, best of 3: 8.5 us per Loop 


12.5.3 Starting in the middle 


So much for trial division! But we have other tools at our disposal. 

Some of you might have tried something other than straight trial factor- 
ization when attacking n = 899 from our earlier problem. Reason this way; 
since we know that someone is trying to protect a secret, they probably are 
not going to pick a number with primes like 3 and 5 in it. After all, that would 
be too easy to factor. 

In fact, it stands to reason that the primes p and q should be relatively 
large compared to n — so why not start in the middle? 

This was Fermat’s idea for factoring larger numbers. However, he didn’t 
just start with primes in the middle; for one thing, if your number is even 
somewhat big and you don’t have a computer or huge list of primes, how 
would you know where to start? So Fermat became clever, as always, and used 
an algebraic identity to help himself along. 


Fact 12.5.7 Write n = ab, with a > b, and assume n is odd. Then we can 
write n as a difference of two square numbers. 


Proof. Namely, n is the difference of the squares of s = ak and t = a, 


2 = 2 2 2 2 2 
82 e-() (* *) _a a ae b ae | 2ab _ b=n. 


Remark 12.5.8 Why is it fine to assume n is odd in these circumstances? 
This may seem like an obscure identity to us, but at the time (and even 
well into the last century) such identities were the bread and butter of algebra, 
before we had tools like computers to help us along. 
So what Fermat did’ was try this identity backwards. Here is his strategy. 


Algorithm 12.5.9 The Fermat factorization algorithm. To find a factor 
for a number n, begin by seeking a perfect square s* bigger than n, but still as 
close as possible. Now, do the following until you succeed, increasing s by one 
each time. 


¢ Check whether s* —n is itself a perfect square t?. 


e That means we essentially turned 


s? —t? =n around into 57 —n =??. 


Once you succeed, then s and t are not the factors of n; rather, they are 


a=st+tandb=s-t. 
Proof. It should be clear why a and 0 are the factors. But how do we know 
this algorithm terminates? 
Assuming you started with s as instructed, eventually you will reach s=(n+ 
1)/2, which is much larger than \/n. But then ((n+1)/2)?-—n = ” Se Le 
((n — 1)/2)?. You should check that this gives us the trivial etedaation 


15 At least in the general case; see [E.5.8, II.1V] for his approach for special numbers, such 
as the Mersenne number M37 = 237 — 1. 
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n=n-1, though! (See Exercise 12.7.11) a 


Here is an implementation — again, assuredly slow, but at least verbose in 
its explanation — of this strategy. We simply start with the next s above the 
square root of n, and just keep trying s? — n again and again for bigger and 
bigger s. 


def FermatFactor(n, verbose=False): 
if n%2==0: 
raise TypeError("Input must _be_odd!") 
s=ceil(math.sqrt(n)) 
top=(nt+1)/2 
while is_square(s*2-n)==0: 
if verbose: 
print(s, "squared _minus",n, 
not.a.perfect_square") 


is",s*2-n, "which iis. 
s=st] 

t=sqrt(s*2-n) 

print ("Fermat_found_that",s,"squared.minus",t, "squared. 
equals",n) 

if s*2==n: 
print("So",n,"was_ already a_perfect. 


square,",s,"times",s) 
elif s<top: 
print("So",s+t,"times",s-t,"equals",(s-t)*(s+t),"whichL 
1S" >) 


elif s==top: 
print ("So_Fermat did _not_findia_factor , which, 
means",n,"is.prime!") 


Example 12.5.10 Before we move on, let’s try to factor 143 and 93 using this 
algorithm. Remember, we start with s? — n, where s is the next integer above 
s/n, and see if it is a perfect square; then we increase s by one each time. 

After you attempt this by hand, you can see what Sage does with them to 
check. 


FermatFactor (143, verbose=True) 


Fermat found that 12 squared minus 1 squared equals 143 
So 13 times 11 equals 143 which is 143 


Well, we struck gold on the first try here! That happens if your number 
is the product of two primes which are two apart. (Such primes are known as 
twin primes, and have some interesting stories. Among other things, calculat- 
ing with them helped find a bug in the Pentium computer chip in 1995; see 
Subsection 22.3.2.) 


FermatFactor (93, verbose=True) 


1@ squared minus 93 is 7 which is not a perfect square 
11 squared minus 93 is 28 which is not a perfect square 


Fermat found that 17 squared minus 14 squared equals 93 
So 31 times 3 equals 93 which is 93 


As you can see, we probably would have been better off with trial division 
for n = 93. It’s obvious that it’s divisible by 3, but that takes a long time to 
reach from the middle. 
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12.6 A Taste of Modernity 


Now, these methods are the beginnings of how people really factor big numbers. 
Typically, one does trial division up to a certain size (maybe the first few 
hundred or thousand primes), then perhaps some modification of Fermat to 
make sure that there aren’t any factors close to the square root if you are 
attacking something like RSA where that would otherwise be advantageous. 

Then what? 

There are many answers to this question, some of which involve cool things 
called continued fractions or number fields. See Exercises 12.7.13-12.7.15 
to investigate these, starting with a simpler (but related) algorithm to Lenstra’s 
in Exercise 12.7.13. See [E.5.2, Chapter 9] for an elementary approach to 
Exercise 12.7.15. 

Another important modern technique that is beginning to show up in in- 
troductory textbooks is Lenstra’s elliptic curve method; see once again either 
[E.4.19] or [E.2.10, Chapter 18.7] for details at the level of this text. 

A rather less elementary, but potentially extremely important, technique 
for factoring is Shor’s algorithm. Explaining this quantum computational algo- 
rithm in any detail goes well beyond this text; its main significance is that if a 
sizable quantum computer implementing this algorithm could be built, it could 
perform the specific tasks of factoring large RSA moduli in polynomial (rather 
than exponential) time. As of this writing, we’re still a long way from reliable 
physical implementations of any size, but they are actively being pursued. See 
[E.6.12, Section 7.3] for details and even worked-out factorization samples. 

We won’t touch more on those topics, but Algorithm 12.4.7 brought up 
a concept important in factoring, not just finding primes. Namely, we could 
come up with some probabilistic/random methods. That’s right, we are going 
to try to find a factor randomly! 


12.6.1 The Pollard Rho algorithm 
Here is the essence of this random (or ‘Monte Carlo’) approach; it is highly 
recursive, like many good algorithms. 
Algorithm 12.6.1 Generic routine for “random” factoring. Follow 
these steps. 

e Pick some polynomial that will be easy to compute mod (n). 

e Plug in an essentially random seed value. (Often the seed is 2 or 3.) 


¢ Compute the polynomial’s value at the seed. 


e If that has a non-trivial gcd with n, we have a factor. Otherwise, plug the 
new value back into the polynomial, and repeat (and hope it eventually 
succeeds). 

Below is code for the method we’ll discuss in this section. It has a modifi- 

cation to the generic algorithm which I will discuss below. 


def PollardRhoFactor(n, kstop=50, seed=2): 
d=1 
a,b=seed, seed 
k=1 
def f(x): 
return (x*2+1)%n 
while (d==1 or d==n): 
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a = f(a) 

b = f(f(b)) 

d=gcd(a-b,n) 

k=k+1 

if k>kstop: 
print ("Pollard Rho breaking _off_after %s_ 

rounds"%k) 
break 
if d>1: 

print ("Pollard Rho _took.%s.rounds"%k) 

print ("The_number_it_ tried. in the _last_round was %s,. 
which. shares,_factor_%s"%(a-b,d)) 

print("And_%s_is.aifactor_ofi%s_since %sik. 
%s=%s"%(d,n,d,n/d,d*(n/d))) 


The essence of the method is that by plugging in the values of the polyno- 
mial modulo n, we are generating a ‘pseudo-random’ sequence of numbers. 


° 29 =2 (mod n) 
° 21 = f (ao) (mod n) 
° x = f (21) (mod n) 
© 2:41 = f(x;), all (mod n). 


Such a ‘pseudo-random’ sequence might be better than the sequences we used 
for trial division or Fermat factorization, precisely because it will likely hit 
some small(ish) factors and some fairly large factors, a good mix. It might 
also be good that it could give us numbers which, although not a factor of n, 
might at least share a factor with n. 

A first choice of seed and polynomial might be zo = 2 and f(x) = 2? +1. 
These choices could be different, but they are typical; John Pollard’s original 
paper?® used f(x) = a? — 1, for instance. 


Example 12.6.2 Let’s try computing what we get for some specific numbers. 
Picking n = 8051 as in [E.2.4, Example 3.25], we get results as in the following 
interact. 


War (C'xe")) 
@interact 
def _(seed=2,n=8051,poly=x*2+1, trials=(10,[2..50])): 
f(x)=poly 
for i in range(trials): 
pretty_print (html ("$x_{%s }=%s$"%(i, seed) )) 
seed = (ZZ(f(seed)) % ZZ(n)) 
pretty_print(html("$x_{%s }=%s$"%(it1, seed))) 


Notice that for n = 8051, the term x4 = 219 (mod n), so the sequence, while 
seeming fairly random, will not come up with every possible divisor. With seed 
3, £13 = £¢ is the first repeat; if instead we change the modulus to 8053, there 
is no repeat through x59. So you can see the output will be hard to predict. 


Although the outputs of 241 = f(a;) might already seem to be fairly 
random, we will not actually try to find common divisors of these numbers with 
n. Instead, we will try to see if all the differences x;— x; share a common factor 


16doi.org/10.1007%2Fbf01933667 
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with n, using the (highly efficient to compute) greatest common divisor. That 
gives a lot more opportunities to find a common factor than just comparing 
with x;! And hopefully it’s just as (or more) ‘random’, and just as effective at 
finding factors. 

However, having all possible differences to check might slow things down 
too much. So instead there is a final modification to the algorithm. 

First, since there are finitely many possible outcomes modulo any particular 
modulus d, the sequence of results will eventually repeat not just modulo n, but 
for any d. In particular, suppose d is a divisor of n such that ged(#;—2;,n) = d 
for a specific pair x; and 2; with 7 > i. 

Now consider the values of the sequence of xg modulo d. Because polyno- 
mials are well-defined in modular arithmetic, we have that 


x; = x; (mod d) 
implies that, for any m we have 
Im+i = f (vi) = f'"(xj) = Lm4j (mod d) 
as well. When we let m = j — i, this becomes 
XL; = Lj; (mod d) 
which means the sequence (modulo d, the common divisor, not n) repeats itself 
every j — i terms (after 7, of course, if 7 —i < 7%). 


Now let s be an integer such that s(j — i) > i. Then x,(;_;) appears after 
the periodic behavior (modulo d) begins, so 


Ls(j-i) = Us(j-i +(j-i) = ++ = Ls(j—a) +s(j—1) (mod d). 
If we now let & = s(j — 7) this congruence means 
Lp = L2z, (mod d) 


so d is a divisor of x2, — xx specifically, not just x; — x;. 
Finally, this means instead of checking all possible differences for a common 
divisor, we only have to check gcd(a2, — xz, 7) for all k in the algorithm. 


Algorithm 12.6.3 Pollard Rho factoring algorithm. Follow these steps. 


e Pick some polynomial f(x) that will be easy to compute (mod n) (such 
as z* +1, though other quadratics might be used). 


e Plug in an essentially random seed value x. (Often the seed is 2 or 3.) 
¢ Compute the polynomial’s value at the seed, f (xo) = £1. 
¢ Continue plugging in f(a;) = 241, modulo n. 


e For each k we check whether 


1 < ged(aox — tp,n) =d<n. 

Since the algorithm doesn’t always find a factor for any given combination 
of seed, number, and polynomial, there is nothing to prove per se. However, 
probabilistically (just like with Miller-Rabin) it should succeed for k in the 
neighborhood of the size of the square root of the smallest factor of n. (This 
is also the justification for introducing the algorithm in the original papers 
introducing this method and its variants.) So if n has a big, but not too big, 
divisor, this test should help us find that divisor. 
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Example 12.6.4 Let’s try this with n = 9991. Keeping x?+ 1 and seed 2, the 
numbers we would get for the first six rounds are 


to = 2,21 = 5, x2 = 26,23 = 677, %4 = 8735, 25 = 8950 
x6 = 4654, x7 = 9220, rg = 4973, v9 = 3005, x19 = 8153. 
This gives differences as follows: 

e %2— 24, = 26-—5=21 

e &4 — £2 = 8735 — 26 = 8709 

e % — £3 = 4654 — 677 = 3977 

e tg — &4 = 4973 — 8735 = —3762 

e 19 — £5 = 8153 — 8950 = —797 


These factor as follows: 


factor(21), factor(8709), factor(3977), factor (3672), 
factor (797) 


(3 * 7, 3 * 2903, 41 * 97, 243 * 3%3 * 17, 797) 


That is an impressive list of eight different prime factors that could poten- 
tially be shared with 9991 in just five iterations. These differences have the 
following gcds with 9991: 


gcd(9991,21), gcd(9991,8709), gcd(9991,3977), 
gcd (9991 ,3672), gcd(9991,797) 


(1, 1, 97, 1, 1) 
Indeed the third one already caught a common divisor with 9991. 


PollardRhoFactor (8051) 


Pollard Rho took 4 rounds 

The number it tried in the last round was -3977, which 
shares factor 97 

And 97 is a factor of 9991 since 97 * 103=9991 


Remark 12.6.5 This method is called the Pollard rho method because (appar- 
ently) a very imaginative eye can interpret the x; eventually repeating (mod d) 
(in the example, d = 97) as a tail and then a loop, i.e. a Greek p. John Pollard 
actually has another method named after him, the p— 1 method, which we will 
not explore; however, it is related to some of the more advanced methods we 
mentioned in the introduction to this section. 


Example 12.6.6 Sometimes the rho method doesn’t come up with an answer 
quickly, or at all. 


PollardRhoFactor (991%*997) 


Pollard Rho breaking off after 51 rounds 


Here we took 50 rounds without success, using the seed 2. Because of the 
p repetition, it will never succeed. So what do you do then — bring out the 
really advanced methods? Not at all — just as with primality testing, you just 
change your starting point to try again! 
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PollardRhoFactor (991*997 , seed=3) 


Pollard Rho took 15 rounds 

The number it tried in the last round was 74775, which 
shares factor 997 

And 997 is a factor of 988027 since 997 * 991=988027 


12.6.2 More factorization 


In general, there are other such probabilistic algorithms, and they are quite 
successful with factoring numbers which might have reasonably sized but not 
gigantic factors. 


Historical remark 12.6.7 Factoring Fermat. The big success of this 
algorithm was the 1980 factorization of Fg (the eighth Fermat number) by 
Pollard and Richard Brent (see Brent’s website!”). They used one of several 
variants of the method, due to Brent, to found the previously unknown prime 
factor!® 1238926361552897. Finding this factor of Fg took, in total, 2 hours on 
a UNIVAC 1100/42 (which for the time period was very fast, indeed). 

Interestingly, the other (much larger) factor was not proven to be prime 
until significantly later; and as of this writing even Fig has not been fully 
factored!®! See Subsection 17.5.2 for even more information. 

Things don’t automatically work quickly even with today’s far more pow- 
erful hardware. 


PollardRhoFactor (2*(2%8)+1,1000000) # one million rounds! 


Pollard Rho breaking off after 1000001 rounds 


Hmm, what now? Let’s change the seed. 


PollardRhoFactor (2%(2%8)+1,1000000, seed=3) 


Pollard Rho breaking off after 1000001 rounds 


No one method will factor every number quickly. Luckily, we have bigger 
guns at our disposal in Sage (especially in the component program Pari), that 
polish thing off rather more quickly. 


factor (2*(2%8) +1) 


1238926361552897 * 93...321 


That is a little better than two hours on a mainframe, or even on your 
computer, I hope you'll agree. 

Real factorization algorithms use several different methods to attack differ- 
ent types of factors. We can try to simulate this in a basic way by creating 
a Sage interact. Evaluate the first cell to define things (don’t forget to keep 
the rho method defined); then you can evaluate the second cell, which is the 
interact. 


def TrialDivFactor(n): 
p = next_prime(1) 


1’maths-people. anu. edu. au/~brent/F8.html 

18If you want to memorize this historic number, Brent provides the phrase “I am now 
entirely persuaded to employ the method, a handy trick, on gigantic composite numbers” to 
help you remember it. 

19www. prothsearch.com/fermat.html#CompLete 
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top = ceil(math.sqrt(n)) 
while p < top: 
if mod(n,p)==0: 
break 
p=next_prime(p) 
if n==1: 
print("1_is not prime") 
elif p==n: 
print(n, "is prime") 
elif mod(n,p)==0: 
print(n,"factors_as",p,"times",n/p) 
else: 
print(n, "is prime") 


def FermatFactor(n, verbose=False): 
if n%2==0: 
raise TypeError("Input_must_be_odd!") 
s=ceil(math.sqrt(n)) 
top=(nt+1)/2 
while is_square(s*2-n)==0: 
if verbose: 
print(s, "squared_minus",n, 
not._a_perfect_square") 


Tsies 2 in a whit chieais 
s=st] 

t=sqrt(s*2-n) 

print ("Fermat_found_that",s,"squared.minus",t, "squared. 
equals",n) 


iff So2——n: 
print("So",n,"was already a_perfect. 
square,",s,"times",s) 
elif s<top: 
print("So",s+t,"times",s-t,"equals",(s-t)*(s+t),"whichL 
i) (a) 


elif s==top: 
print ("So_Fermat_did_not_findia_factor , which. 
means",n,"is.,prime!") 


@interact 
def _(n=991*997,method=['trial','Fermat!, 'Pollard_Rho']): 
if method=='trial': 
TrialDivFactor (n) 
if method=='Fermat'!: 
FermatFactor (n) 
if method=='Pollard _Rho': 
PollardRhoFactor (n) 


Sage note 12.6.8 Building interacts. An interact is just a Sage/Python 
function, except with @interact before it. There are many different input 
widgets you can use; this one demonstrates using a list and an input box which 
takes any input. See the interact documentation?° or Quickstart?! for many 
examples and more details. 


20web. archive. org/web/20180830102356/http: //doc. sagemath. org/html/en/ 
reference/notebook/sagenb/notebook/interact.html#sagenb.notebook. interact. 
interact 

21doc.sagemath. org/html/en/prep/Quickstarts/Interact. html 
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If you think this sort of thing is cool, the Cunningham Project?” is a place to 
explore. I particularly like their Most Wanted lists. The idea is this: 


The Cunningham Project seeks to factor the numbers 6” + 1 for 
b = 2,3,5,6,7,10,11,12, up to high powers n. 


Another interesting resource is Sage developer Paul Zimmermann’s Integer 
Factoring Records”. Finally, Wagstaff’s The joy of factoring [E.4.12] has tons 
of awesome examples and procedures — far too many, really, as well as an 
excellent discussion of how to speed up trial division, etc. 


12.7 Exercises 


Check the multiplication needed in Lemma 12.1.2. 


Prove the statement of Lemma, 12.1.2 in the case that @ is odd. Hint: the 
factorization you find will be similar, but have a subtle change. 


3. Explain why the extension to Fermat’s Little Theorem just before 
Fact 12.2.2 (or Exercise 9.6.3) is true. 


4. Check that 1729 and 2821 are Carmichael numbers. 

5. Find a Carmichael number of the form 7-23->p for a prime p; include all 
reasoning. 

6. Use either the Fermat or Mersenne coprime facts 12.1.4,12.1.11 to provide 
a different proof that there are infinitely many primes. 


7. Prove that if n is composite then so is M,. Hint: Exercise 12.7.2. 


Exercise Group. For the next two exercises, pick some 4-6 digit numbers 
that don’t share a factor with 30030 = 2-3-5-7-11-13. (Try not to do this 
by just multiplying other, larger primes, or the following exercises will be less 
interesting. ) 


8. Find factors of the numbers you picked using trial division (Algo- 
rithm 12.5.6). 


9. Find factors of the numbers you picked using Fermat Factorization 
(Algorithm 12.5.9). 


10. Try to create a number that takes five steps to factor using both Fermat 
and trial division. (Can you do seven steps?) 


11. Verify the last bit of the proof of The Fermat factorization algorithm. 


12. Try using the Pollard Rho factoring algorithm on a large number you 
create out of a few big primes (not too big!) with different seeds. Can you 
get it to take longer than a few turns? Get your prize numbers; now try 
factoring again with this method where you have changed the polynomial 
to x? +1 or something else other than x? + 1. 


Exercise Group. There are many, many methods of factoring to explore! 
Try looking up some of them in the following exercises. Be warned that some 
of these are pretty deep, though there are good undergraduate-focused expla- 
nations out there for all of them. 
13. Investigate the Pollard p — 1 algorithm. How is it similar to the 
methods mentioned in this chapter, how is it different? (See [E.4.19] 
or [E.2.10] for connecting it to Lenstra’s elliptic curve algorithm.) 


22homes.cerias.purdue.edu/~ssw/cun/ 
23 www. Loria. fr/~zimmerma/records/factor.html 
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16. 


14. (Hard.) Find out what a continued fraction is. Then investigate the 


continued fraction factorization algorithm. How is it similar to the 
methods mentioned in this chapter, how is it different? 


15. (Quite hard.) Find out what a quadratic number field is. Then inves- 


tigate the quadratic sieve factorization algorithm. How is it similar 
to the methods mentioned in this chapter, how is it different? What 
does it have to do with Algorithm 12.5.9? 


In 17 Lectures on Fermat Numbers: From Number Theory to Geometry 
by Michal Krizek, Florian Luca, and Lawrence Somer, the example is 
given that 6 is pseudoprime base 4. Find two other pseudoprimes base 4; 
obviously they should be greater than 4, but you shouldn’t have to look 
beyond 50, either. 


Summary: An Introduction to Cryptography 


There are many mathematical issues that arise in analyzing even these basic 


cryptographic systems — especially ones dealing with primes and composites. 


1. Two impractical, but historically important, sources of new prime num- 


bers are Fermat primes and Mersenne primes. 


. The road to modern primality testing starts with the notion of Pseudo- 


primes. It isn’t the end of the road, because we still have prime impostors 
in Subsection 12.2.2. 


. Hence we dig further into Miller’s test for base a, which comes from our 


observations of how powers work in modular arithmetic. 


. Finally, in Section 12.4 we see a modern, probabilistic primality test . 


. Factoring is very important in testing the security of cryptography. We 


examine some very basic techniques, including The Fermat factorization 
algorithm. 


. We see just a bit of more modern methods in Section 12.6, which should 


prepare you for more advanced ideas. 


As always, there are Exercises to practice, but also to understand the theory 
better. 
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Chapter 13 


Sums of Squares 


We have now more or less exhausted a lot of what we can do with linear 
questions, and even gone beyond to many nonlinear ones. With that in mind, 
we return to other considerations. As a warmup for this and ensuing chapters, 
consider the following question. 


Question 13.0.1 Take a positive integer n (or zero) and try to write it as 
n=a?4+0? for a,b € Z. For which n is this possible, for which is it not? 

It seems that Albert Girard already knew the answer to this question in 
the first quarter of the 17th century, and Fermat discovered it a couple years 
later as well. A full proof of the answer to this question did not come until 
Euler (no surprise here) about six score years after that. 


Historical remark 13.0.2 Albert Girard. Girard! is an interesting figure, 
less well-known than his contemporaries. He apparently was the first to use 
our modern notation for trigonometric functions, and spent his adult life in the 
Netherlands escaping religious persecution as a Protestant in France. 


Historical remark 13.0.3 Leonhard Euler. Euler is well known for being 
a rather conventional religious family man amidst the Enlightenment court of 
Frederick the Great, and for taking a lot of teasing from Voltaire and the king 
(among other things, for being partly blind at the time). See [E.5.6] for much 
more about him and his work? at the level of this text, or over one third of 
[E.5.8] for a detailed perspective by an eminent number theorist, or simply 
browse the Euler Archive’. 

There is a lot more to say about someone universally acknowledged as one of 
the greatest mathematicians of all time, but we already have plenty of Euler’s 
work in this book for you to peruse. 


Historical remark 13.0.4 Pierre de Fermat. We’ve already seen Fermat’s 
work several times (such as Subsubsection 3.4.3.2, Theorem 7.5.3, and Subsec- 
tion 12.1.1), and we’ll see another glimpse of him in Question 15.6.5. About 
the man himself we know less, mostly that he was a jurist in southern France 
who didn’t travel much, but corresponded a fair amount about his mathemat- 
ics, which included prototypes for both differential and integral calculus! As 
with most things about Fermat’s personal life, it’s less well known that he 
also had a religious side; in [E.7.12] a well-known classicist translates a moving 
poem about the dying Christ written in honor of one of Fermat’s friends. See 


lImathshistory.st-andrews.ac.uk/Biographies/Girard_ALlbert/ 

?There is an immense recent English-language biography about him, but I do not actually 
recommend it for most readers. 

3eulerarchive.maa.org 
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[E.5.8, Chapter I] for many mathematical, and some personal, details. 
So try out Question 13.0.1! Some things to think about while you try this: 


e Are any special types of numbers easier to write in this way than others? 
e Is there any way of generating new such numbers from old ones? 


e Ifsome types of numbers are not a sum of squares, how might you prove 
this? 


A separate question to at least keep track of is this. 


Question 13.0.5 Assuming you can indeed write it in this way, how many 
ways you can write a number as a sum of squares? 


This chapter is completely devoted to continuing to address questions about 
writing numbers as a sum of two squares. It will lead us a little far afield, 
of necessity, to ask (and start to answer) questions about congruences again. 
Much of this chapter will be devoted to a geometric proof that certain numbers 
are indeed representable as a sum of two squares. This chapter is a perfect 
illustration of one of the main themes of this text — the unity of mathematics. 


13.1 Some First Ideas 


13.1.1 A first pattern 


Let’s assume you’ve done some exploration on your own. Here’s a first pattern 
that you may have noticed, similarly to patterns in the past. 


Fact 13.1.1 If n= 3 (mod 4), then n is not writeable as a sum of squares. 


Proof. You should be able to prove this pretty easily based on things you 
already know about squares modulo 4. (See Exercise 13.7.1) a 


The next thing to note is that Sage has a nice command to tell us an answer. 


two_squares (29) 


(2, 5) 
If a representation doesn’t exist, we get an error. If it does, Sage returns 
two numbers (a,b) such that a? + 6? = your number. 
In the next cell, I pick a number for which n = 1 (mod 4), but this number 
cannot be written in this form. Thus Fact 13.1.1 doesn’t just take care of all 
cases. 


two_squares (21) 


Traceback (most recent call last): 


ValueError: 21 is not a sum of 2 squares 


Fact 13.1.2 There are positive integers with remainders 0, 1, and 2 when 
divided by four, but which are not representable as a sum of two squares. 
Proof. Show that 12, 21, and 6 are not. (See Exercise 13.7.2.) | 


You can use this interact to explore while avoiding the errors. 


@interact 
def _(n=29): 
try: 
a,b = two_squares(n) 
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pretty_print (html ("We ican write. 
${Q}={1}*2+{2}*2$". format(n,a,b))) 
except ValueError: 
pretty_print (html ("${0}$_ is. not ia_sum_of two. 
squares". format (n))) 


Sage note 13.1.3 Handling errors. Most computer languages have a way 
to “handle” errors if we don’t want to think of them as errors. In Python, this 
is the try/except syntax you see above. Basically, we are trying to use the 
two squares command, but if it hiccups, we instead just print a nice message. 


Remark 13.1.4 We have already addressed a very special case of writing 
numbers as a sum of squares. In fact, in Theorem 3.4.6 we saw a precise 
characterization of when a perfect square is a sum of two squares. We will 
mention this again briefly in Subsection 14.2.2. 


13.1.2 Geometry 


Next, we can interpret this question very differently, relying on our geometric 
intuition. Figure 13.1.5 helps us visualize the problem. 


Figure 13.1.5 Five as a sum of squares 


In Figure 13.1.5, n = a? + b?, then n is the square of the radius of a circle 
which has (a,b) as the coordinates of a point. So the sum of squares problem 
is actually a geometric one! Try it interactively below. 


@interact 

def _(n=(5,list(range(100)))): 
viewsize=ceil(math.sqrt(n))+2 
EX, y)=x*2t+y*2 
p = implicit_plot(g-n, (-viewsize,viewsize), 
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(-viewsize,viewsize), plot_points = 100) 
lattice_pts = [[i,j] for i in [-viewsize..viewsize] for 
j in [-viewsize..viewsize]] 
plot_lattice_pts = 
points(lattice_pts ,rgbcolor=(0,0,0) ,pointsize=2) 
curve_pts = [coords for coords in lattice_pts if 
g(coords[@], coords[1])==n] 
if len(curve_pts)==0: 
show(p+plot_lattice_pts, figsize = [5,5], xmin = 


-viewsize, xmax = viewsize, ymin = -viewsize, 
ymax = viewsize, aspect_ratio=1) 
else: 
plot_curve_pts = points(curve_pts, rgbcolor = 


(@,0,1),pointsize=20) 
show(p+plot_lattice_ptst+plot_curve_pts, figsize = 

[5,5], xmin = -viewsize, xmax = viewsize, ymin = 

-viewsize, ymax = viewsize, aspect_ratio=1) 


That is, we can rewrite Questions 13.0.1 and 13.0.5 like this. 
Question 13.1.6 


e Which circles around the origin do (or do not) have lattice points? 


e If acircle has lattice points, how many does it have? 


We will choose to address these questions by connecting to geometry. There 
are many ways; for instance, in Section 20.1 we will connect to calculus ideas 
in number theory. 


13.1.3 Connections to some very old mathematics 


The following identity was, separately, already known to Diophantus (remem- 
ber Diophantine equations?) around 250, to Brahmagupta (about whom more 
in Historical remark 15.5.6 and Section 15.6) around 600, and to Leonardo of 
Pisa (known also as Fibonacci) around 1250. 


Fact 13.1.7 Brahmagupta-Fibonacci identity. 


(a? + b*) (c? +d?) = (ac— bd)” + (ad + be)? 
Proof. Multiply and cancel; see Exercise 13.7.6. | 
This sort of identity may seem amazing to us, but to people used to needing 
lots of symbolic manipulation, it was just part of a toolkit by the time number 
theory began ascending with Fermat or Euler. 


Historical remark 13.1.8 Fibonacci. Leonardo of Pisa*+ grew up among 
Italian merchants in North Africa and learned much mathematics there; we 
have seen him a few times already in his eponymous numbers (Exercise 2.5.17) 
and in Exercise 5.6.22. However, while it seems pretty clear Fibonacci borrowed 
extensively from the Islamic mathematical heritage for many of his problems, 
not only was his Liber Abaci very influential for spreading our modern decimal 
system into Europe (from India via the Islamic world), but he did nontrivial 
original number theory work as well (see [E.5.3, Section 8-4]). 


What is useful about this identity is that it implies the following. 


4mathshistory.st-andrews.ac.uk/Biographies/Fibonacci/ 
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Fact 13.1.9 Products of numbers writeable as sums of squares can also be 
written as sums of squares! 


Proof. Use 13.1.7 above. | 


@interact 
def _(m=(13,[0..100]) ,n=(8,[0..100])): 
try: 
a,b = two_squares(m) 
c,d = two_squares(n) 
pretty_print (html (r"We_know_we_can_write, 
${6}={0}\cdot_{1}$ as. 
$({2}*%24+{3}%2) ({4}*%24+{5}*%2)$".format(m, n, a, b, 
c, d, mx*n))) 
pretty_print (html (r"But_it is also writeable as. 
$({O@}\ cdot {1}-{2}\ cdot {3}) *2 +. 
({O@}\ cdot {3}+{1}\ cdot {2}) *2.=. 
{4}*2+{5}*2={6}$".format(a, c , b,d_, 
abs(axc-b*d), axd+b*xc,mxn))) 
except ValueError: 
pretty_print (html ("Please_pick numbers, that are both. 
writeable_as_a_sum_of.two.squares")) 


A final question for the reader is to ponder why this means that we can really 
reduce the question to whether primes are writeable as a sum of squares. 


13.2 At Most One Way For Primes 


Most of the rest of this chapter is dedicated to proving what we can about how 
to write numbers as sums of squares. We will begin our proofs by talking about 
how many ways we can write some numbers as a sum of squares. Namely, we’ll 
connect sums of squares to factorization. 

Remember that the Brahmagupta-Fibonacci identity says that if two num- 
bers are sums of two squares, so is their product. Remarkably, we can sort of 
do this backwards. 

First, we need to say what we might mean by writing a number as a sum 
of squares in two essentially different ways. Compare 


13 = 3? +2? = 2? + 3? 


to the situation 
25 = 5° +0? = 3744’. 
We say the latter ways are essentially different, because they involve two 
different pairs of nonnegative integers. 
It is not a coincidence that 13 is prime, while the number 25 which has two 
ways to be written is composite. 


Fact 13.2.1 If an odd number N is writeable in two essentially different 
(nonnegative) ways as a sum of two squares, then N = yz, where y,z >1 and 
y,z are themselves writeable as sums of two squares. 

Proof. Assume first that 


N=@4+0?=24+¢@ 
with a,c odd and b,d even nonnegative integers. Then, assuming a > c and 


d> b, let 
k = gcd(a — c,d — b) and n= ged(a+c,d+ 0). 
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Both k and n are even, and 


GAO OTP 554 _ate d—b 
ko on 1 n k- 


k\? n\ 2 
v-[()+@ 
IG) +6 
There are some details remaining here, especially in terms of verifying all these 


numbers exist, but they mostly just use the definitions of gcd and parity. See 
Exercise Group 13.7.8-11. | 


Example 13.2.2 Let’s examine N = 25. First, what are a,b, c,d? 
Once you have computed them, you should confirm that & = gced(2, 4) = 2, 
n = gcd(8, 4) = 4 which means ¢ = 1 and m = 2. This yields 


2\7 4)? 
25 = |[ = = 
IG) +@) 
So 25 is a product of numbers, each themselves writeable as a sum of two 
squares. 


l= 


Then we get that 


ie + m?) ; 


(1? 4.27) =5 +5. 


Remark 13.2.3 This method for factoring is apparently due to Euler; see 
Wikipedia®, which references [E.5.3]._ An interesting generalization for the 
situation where one has two different ways to write an odd integer as a sum of 
the form ma? + ny? for positive m,n may be found in [E.7.30]. 


It is now nearly trivial to prove the following. 


Proposition 13.2.4 A prime is writeable in zero or one (positive) way as a 

sum of two squares. 

Proof. This is clear for p = 2. It remains to consider the case of p odd. If p is 

writeable in two different ways, it factors by Fact 13.2.1. But prime numbers 

don’t factor nontrivially, so there must be just one way to do it. 

Note that there could be zero ways to write p. If p > 2 odd happens to be 

p = 3 (mod 4), Fact 13.1.1 says as much, so the use of Fact 13.2.1 in the first 

paragraph is really only being applied to p = 1 (mod 4). a 
For example, in Figure 13.2.5 we see that thirteen is only writeable as 3?+2? 

(or 2? + 32). 


5en.wikipedia. org/wiki/EuLler' s_factorization_method 
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Figure 13.2.5 Thirteen in just one way 

We can confirm Proposition 13.2.4 visually in many cases, in that each of 
the circles with radius squared a prime either has no lattice points, or its only 
positive lattice points are (a,b) and (b,a) for one a and b. 


@interact 

def _(n=(5,prime_range(150))): 
viewsize=ceil(math.sqrt(n))+.5 
B(x, y)=x*2+y*2 


p = implicit_plot(g-n, (-1,viewsize), (-1,viewsize), 
plot_points = 100) 
Lattice_pts = [Li,j] for i in [-1..viewsize] for j in 


[-1..viewsize]] 
plot_lattice_pts = points(lattice_pts, rgbcolor=(0,0,Q0), 
pointsize=2) 
curve_pts = [coords for coords in lattice_pts if 
g(coords[@], coords[1])==n] 
if len(curve_pts)==0: 
show(p+plot_lattice_pts, figsize = [5,5], xmin = -1, 
xmax = viewsize, ymin = -1, ymax = viewsize, 
aspect_ratio=1) 
else: 
plot_curve_pts = points(curve_pts, rgbcolor = 
(®@,9,1), pointsize=20) 
show(p+plot_lattice_ptst+plot_curve_pts, figsize = 
[5,5], xmin = -1, xmax = viewsize, ymin = -1, 
ymax = viewsize, aspect_ratio=1) 
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13.3 A Lemma About Square Roots Modulo n 


We’ll continue our formal investigation of what numbers are sums of two 
squares by taking a look at a seemingly unrelated lemma about square roots. 
In Section 14.1 we’ll see that square roots of negative one (thinking of —1 € Z, 
not Z,,) are connected to sums of squares as well, so it is not completely im- 
plausible to connect roots and these sums. 

Before we do this, let’s codify something we already have discussed since 
Question 7.1.1 at various times, e.g. in Fact 7.3.1 or Section 7.6. 


Definition 13.3.1 We say that a number a has a square root modulo n if 
there is some number x with 


x? =a (mod n). 


% 
As an example using this framework, here is an alternate proof of Exer- 
cise 7.7.12. 


Fact 13.3.2 For an odd prime p, the only way there is a square root of —1 
modulo p is if p= 1 (mod 4). 

Proof. We will use group theory to prove this. 

Assume there is a square root f, so that 


f? =—1 (mod p). 
Then the order of f in U, is four, since 
fi=(PP =p =1 


We know that the order 
| Up |= p-1 


but then Lagrange’s (group theory) Theorem 8.3.12 says that four divides p—1. 
Given that, the only possible kind of prime p solving this is the form 4k + 1. 
a 

Remember, this means there can’t be a square root of minus one if p = 

3 (mod 4). Of course, it also only means that there might be one if p = 
1 (mod 4), so we certainly need the following lemma to confirm there is one. 
(See its use in Subsection 16.1.1, where we combine everything into Fact 16.1.2.) 


Lemma 13.3.3 For an odd prime p = 1 (mod 4), there actually does exist a 


square root of —1 modulo p. That is, there is an f such that 


f? =—-1 (mod p). 
Before we start the proof of Lemma 13.3.3, recall Wilson’s Theorem, which 
states that 
(p — 1)! = —1 (mod p) for primes. 


Do you remember our proof? We paired up all the numbers from 2 to p— 2 in 
pairs of multiplicative inverses (mod p), thus: 


(p—1)!=1-2-271.3-371..-(p—1) = (p—1) =—1 (mod p). 


Our strategy for this proof will be similar, using all numbers from 1 to p — 1, 
but paired up in a different way. 


Proof of Lemma 13.3.3. Pair up the numbers from 1 to p— 1 in a product, in 
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pairs of additive inverses (mod p): 


(p—1)!=1-(p—1)-2-(p—2)-3-(p 3) Po PE 
f1-2-3.--P 4). lip 1)-(p a)... PA 


This makes sense because (p — 1)/2 is an integer halfway between 1 and p, as 
p is odd. 

If we rethink things (mod p), we can rewrite this in a more suggestive way. Let 
(1-2-3--- pt) be called f. This is also (2)! of course. Then, keeping in 
mind that prt =p—2 


1-232). 1)-(p 2). PE) 


=f-(-1)> ji-2-3.-2 4] S(i)s 7. 


Remember that our hypothesis is p= 1 (mod 4). Then p = 4k + 1 for integer 
k, so 25+ = 2k is even and by Wilson’s Theorem 


—1= f? (mod p) 


| 
What is neat about this proof is that it shows there are precisely two square 
roots of negative one — as Lagrange’s (polynomial) Theorem 7.4.1 suggests. We 
even have a formula for them: 
_— 4 (Pa 1 ! 
p= (25): 


where the exclamation point here indicates the factorial. Especially given the 
proof, an imaginative mind® might call this, “The square root of Wilson’s 
Theorem,” by analogy with Theorem 12.3.2. 

Somehow this is a satisfying answer. We can check that these really are 
square roots of —1 using Sage. 


@interact 
def _(p=(13,[q for q in prime_range(200) if q%4==1])): 
f=mod(factorial((p-1)/2),p) 
pretty_print(html(r"The_potential._square._roots_of_$-1$. 
are_$\pm_\Lleft(\frac{%s-1}{2}\right) !=%s ,%s\text{u 
(mod_}%s)$"%(p,f,-f,p))) 
pretty_print (html (r"And_we_can.compute,that. 
${O}*%2\ equiv{1}$ and_${2}*%2\ equiv. {3}$ modulo. 
${4}$". format (f,f*2,-f,(-f)%2,p))) 


Remark 13.3.4 A class act. An observant reader may have noticed that 
when p = 3 (mod 4) the Proof of Lemma 13.3.3 can still be used, mutatis 
mutandis, to show that + (2)! are square roots of 1 modulo p. (See Exer- 
cise 13.7.24.) Of course, we already know everything about those; they are just 


6Such as that of Abraham Holleran, to whom I am indebted for this point. 
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+1 modulo p, as you can test out below. 


@interact 
def _(p=(11,[q for q in prime_range(200) if q%4==3])): 
f=mod( factorial ((p-1)/2),p) 
pretty_print (html (r"The_potential_square_roots_of_$1$_ 
are_$\pm_\left(\frac{%s-1}{2}\right) !=%s ,%s\text{_ 
(mod_}%s)$"%(p,f,-f,p))) 
pretty_print (html (r"And_we.can._compute,_that. 
${O}*2\ equiv{1}$ and_${2}*%2\ equiv.{3}$ modulo. 
${4}$".format(f,f*2,-f,(-f)%*2,p))) 


Here comes the interesting part. If you play around with the interact, you 
will notice that sometimes (2+)! = 1, and sometimes (2)! = —1. It’s not 
immediately evident whether there is a pattern here. 

But there is a formula. Foreshadowing Definition 14.1.2, if we define the 
number system {a4 pincer | a,b € Z} (ignoring whether that actually makes 
sense to do), one can define a special group (recall Definition 8.3.3) called the 
ideal class group of this number system. The order of this group is denoted 
by h. Nearly miraculously, if p > 3 of this type, then 


(2): = (-1)*)/2 (mod p). 


2 


The default setting of the interact above is for p = 11, and checking this 
list? we see that h = 1 and indeed (44+)! = 120 = —-1 = (-1)¢+/? modulo 
11, while for p = 23 we have h = 3 and (73=+)! = 39916800 = 1 = (-1)@t)/? 
modulo 23. 

There is a similar, but more complicated, formula when p = 1 (mod 4). And 
by ‘complicated’, I mean that if you’ve read this far, you’ve already guessed this 
is one of the most advanced remarks of the text. The class number being greater 
than one is closely related to the factorization question raised in Exercise 6.6.30; 
note that the class number for p = 5 in this setting is 2. For more on all 
this (accessible if you’ve had a decent introduction to rings and fields), see 
[E.4.26, Chapters 10 and 26]. 


13.4 Primes as Sum of Squares 


In the past few sections, one of the many things you may have conjectured 
about sums of squares is that every prime of the form p = 4k +1 can be 
represented as the sum of two squares. Combined with Fact 13.1.9, limiting 
the question to primes should be sufficient to finish analyzing the question for 
any positive number. (See Theorem 13.5.5 for the final steps putting this all 
together.) 

It turns out it is true that p = 44 +1 can always be written as a sum of 
squares, and we will spend most of the remainder of this chapter proving it. 
At the end of the chapter, we’ll add in Fact 13.1.1 about primes of the form 
p = 4k +3 to see exactly which numbers can be thus represented. 


Remark 13.4.1 To keep with the theme of the unity of mathematics, we do 
this geometrically, not algebraically as in most texts, though the core ideas 
are similar with both proofs. We roughly follow [E.2.1, Chapter 10.6], but ex- 
panded greatly to avoid any direct reference to Hermann Minkowski’s theorem 


7www.numbertheory.org/classnos/ 
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on lattice points in a convex symmetric set. Interestingly, [E.4.16, Theorems 
4.3 and 8.3] only states this and Lagrange’s four square theorem, precisely 
because although Minkowski’s Theorem provides a general framework for ex- 
istence of such points geometrically, one still requires information about qua- 
dratic residues to provide lattice points to work on in the first place. 


13.4.1 A useful plot 


First, let’s look at the following plot on the integer lattice. As you can see, 
I am plotting certain points on the circle x? + y? = n, with n = 5 to begin. 
I have done some ‘magic’ to turn the square root of —1 (mod n) into these 
points. Before telling you the magic, Figure 13.4.2 (and the interact following 
it) will help us get ready. 


e e e 
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4 2 0 2 4 


Figure 13.4.2 An additional lattice 


To be precise, I’ve used this square root of —1 to create the regularly 
spaced grid of blue points. You can think about it as a bunch of corners of 
parallelograms. 


Remark 13.4.3 Sometimes we call things like the set of blue dots a lattice, 
though in this text I will usually use the word lattice only to refer to the usual 
integer lattice of the black dots. A general lattice is something related to a 
concept from linear algebra — vectors generated by a basis, except instead of 
being vectors over Q or R, they are over Z. 

Here is how I constructed the blue grid. First, assume that p is our prime 
and pick f = (2)! as a square root of negative one (or its additive inverse, if 
you prefer); we can use the residue modulo p for convenience. Then the blue 
points are of the form (a,af + bp) for all integers a, b. 


@interact 
def _(p=(5,[£q for gq in prime_range(200) if q%4==1])): 
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f=mod(factorial((p-1)/2),p) 
viewsize=ceil(math.sqrt(p))+2 
B(x, y)=x*2+y*2 


plotl = implicit_plot(g-p, (-viewsize,viewsize), 
(-viewsize,viewsize), plot_points = 100) 
grid_pts = [[Li,j] for i in [-viewsize..viewsize] for j 


in [-viewsize..viewsize]] 

plot_grid_pts = 
points(grid_pts,rgbcolor=(0,0,0) , pointsize=2) 

Lattice_pts = [coords for coords in grid_pts if 
(coords[1]-fxcoords[@]) %p==0] 

plot_lattice_pts = points(lattice_pts, rgbcolor = 
(@,®,1),pointsize=20) 

show(plot1+plot_grid_ptst+plot_lattice_pts, figsize = 
[5,5], xmin = -viewsize, xmax = viewsize, ymin = 
-viewsize, ymax = viewsize, aspect_ratio=1) 


For one final preliminary, let’s define one more thing for any old point (z, y) 
in the integer lattice (and especially for our blue dots). 


Definition 13.4.4 We call the norm of a point (x,y) the sum of squares, 
N(az,y) = 2? 4+ y?. % 


13.4.2 Primes which are sums of squares 


We are now ready to state our big theorem for the section. (See Fact 14.1.8 
for a quite different proof.) 


Theorem 13.4.5 Every prime p of the form 4k +1 can be written as a sum 
of squares. 

Proof. The proof is fairly long. Here is the strategy; the first step will be 
detailed in Subsection 13.4.3 and Subsection 13.4.4. 

Suppose we find some blue dot (a, af + bp) such that 


0 < N(a,af + bp) =a? + (af + bp)? < 2p. 
Then we know, modulo p, that 
N(a,af+bp) = a? +(af+bp)? = a?+(af)? =a?+a"f? =a? —-—a? =0 (mod p), 


so p in fact divides the norm of the point (a,af + bp). 

So we have that 0 < a? + (af +bp)? < 2p and that p | a? + (af +bp)?, meaning 
the only possibility is p = a? + (af + bp)*, which gives p explicitly written as 
a sum of perfect squares. a 


Example 13.4.6 For instance, with p = 5, we have that f = (234)! = 2h= 2, 
so we need to find a point (a, 2a + 5b) such that 


a® + (2a + 5b)? < 2p. 
Guess and check with a = 1 and b = 0 gives us 
N(1,2-1+5-0) =17+(2-14+5-0 =5 <2-5=10 
so this point should work, and this does give the correct statement that 


5 = 17427. 
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What remains to be shown is that there actually 7s such a blue dot. 


13.4.3 Visualizing the proof 


To prove the theorem that for any p = 4k + 1 we can write it as a sum of 
squares, we need to prove there is a blue dot (somewhere) that is not at the 
origin but also has norm smaller than 2p. We will prove this by heavy reference 
to graphics, but all claims also make sense algebraically. Sometimes we need 
help to be able to think about more involved proofs. 

We include a variation on the graphic in Figure 13.4.7 to make this visually 
clear. The bigger circle is the one we care about now — it has formula x? +y? = 
2p, so radius ./2p. If we find a blue point inside the disk bounded by that 
circle, but not at the origin, then the argument in the proof sketch given for 
Theorem 13.4.5 shows this point must be on the smaller circle. 


4- e ° 
2-4 ° 
O-e e 
24 e 
-4- e ° 
: T T 7 7 T 7 
-4 -2 0 2 4 


Figure 13.4.7 The lattice with the second circle 


Here is an interactive version. 


@interact 
def _(p=(5,L£q for q in prime_range(200) if q%4==1])): 
f=mod(factorial((p-1)/2),p) 
viewsize=floor(sqrt(2*p))+2 
EX, y)=x*2t+y*2 
plot1 = implicit_plot(g-p, (-viewsize,viewsize), 
(-viewsize,viewsize), plot_points = 100) 
plot2 = implicit_plot(g-2*p, (-viewsize,viewsize), 
(-viewsize,viewsize), plot_points = 100) 
grid_pts = [[i,j] for i in [-viewsize..viewsize] for j 
in [-viewsize..viewsize]] 
plot_grid_pts = 
points(grid_pts,rgbcolor=(0,0,0) , pointsize=2) 
Lattice_pts = [coords for coords in grid_pts if 
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(coords[1]-fxcoords[@]) %p==0] 
plot_lattice_pts = points(lattice_pts, rgbcolor = 
(@,®,1),pointsize=10) 
show(plot1+plot2+plot_grid_ptst+plot_lattice_pts, figsize 
= [5,5], xmin = -viewsize, xmax = viewsize, ymin = 
-viewsize, ymax = viewsize, aspect_ratio=1) 


Very strangely, the best way to do this is by considering the areas of the 
various circles, and showing that they are so big you just must have a blue 
point in their interior (but not at the origin). Let’s see how this works. 

The area of the bigger circle, which has radius 2p, is t(./2p)? = 27p. 
Since 7 > 2, we have that 27 > 2(2) = 4, which mean that the area of the 
bigger circle is bigger than 4p. 

What we do now is to create a sublattice of the blue dots, which we will 
color green. (This is just a subset of a lattice which still otherwise satisfies 
the conditions for being a lattice.) To create the green sublattice, take all blue 
dots, and just double their coordinates. Naturally, each green dot is still a blue 
dot, including the origin. See Figure 13.4.8. 


4 2 0 2 4 6 8 10 
Figure 13.4.8 The lattice with two circles and triangles 


Next, we take a look at certain triangles made by the different colored 
dots; continue following Figure 13.4.8, or see the interact at the end of this 
subsection. 

Compare the thinnest such triangles one can form, with respect to the 
vertical axis. 


e The thinnest triangle made by blue dots would be of height one. A 
typical one would have vertices the origin and the points (p,0) (with 
a = p,b = —f) and (—f,1) (with a = —f,b = k where f? +1 = kpas 
above). 


e« The thinnest triangle made by the green dots has height two. It has 
width 2p (from the origin to (2p,0), the previous point doubled); the 
apex is the point (—2f,2), which is (—f,1) doubled. 


This triangle has area 4p/2. (Note that depending on whether f is positive or 
negative this triangle might be above or below the x-axis.) 

Now consider the parallelogram with the solid red lines made of two of 
these triangles — from the origin to (—2f,2) to (2p,0) to (2p + 2f,—-2) and 
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back. (Recall that f is a square root of —1 modulo p.) This quadrilateral has 
area 4p, which means its area is smaller than that of the bigger circle. 

In Figure 13.4.8 we have p = 5 and f = —3. To see this all interactively, 
evaluate the interact; click triangles_on to see the green dot triangle and 
parallelogram outlined in red. 


@interact 
def _(p=(5,[q for q in prime_range(200) if q%4==1]), 
triangles_on=False): 
f=mod(factorial((p-1)/2),p) 
viewsize=2xp 
EX, y)=x*2t+y*2 
plotl = implicit_plot(g-p, (-viewsize,viewsize), 
(-viewsize,viewsize), plot_points = 100) 
plot2 = implicit_plot(g-2*p, (-viewsize,viewsize), 
(-viewsize,viewsize), plot_points = 100) 
plot3 = Lline([L0,0], [2*p-2*Integer(f) ,2], [2*p,0], 
[2*Integer(f),-2], [@,0]], rgbcolor=(1,0,0)) 
plot4 = line2d([l0,0], [2*p,@]], rgbcolor=(1,0,0), 
Linestyle='--') 
grid_pts = [[i,j] for i in [-viewsize..viewsize] for j 
in [-viewsize..viewsize]] 
plot_grid_pts = 
points(grid_pts,rgbcolor=(0,0,0) , pointsize=2) 
Lattice_pts = [coords for coords in grid_pts if 
(coords[1]-fxcoords[@])%p==0] 
plot_lattice_pts = points(lattice_pts, rgbcolor = 
(@,®,1),pointsize=10) 
plot_lattice_pts2 = points([L2*coords[@],2*coords[1]] 
for coords in lattice_pts], rgbcolor = 
(@,1,0),pointsize=20) 
if triangles_on: 
show(ploti+plot2+plot3+plot4 + plot_grid_pts + 
plot_lattice_ptst+plot_lattice_pts2, xmin = 
-viewsize/2, xmax = viewsize, ymin = 
-viewsize/2, ymax = viewsize/2, aspect_ratio=1) 
else: 
show(ploti+plot2+plot_grid_pts + plot_lattice_pts + 
plot_lattice_pts2, xmin = -viewsize/2, xmax = 
viewsize, ymin = -viewsize/2, ymax = viewsize/2, 
aspect_ratio=1) 


The last stage of the proof is very visual. Before we move on, make sure you 
believe all the claims of this stage, especially the claims about areas. Those 
are the ones we will analyze more closely to finish the proof of Theorem 13.4.5. 
Remember always that we are trying to prove that there is a blue point con- 
tained inside the disk bounded by the bigger blue circle, but away from the 
origin. 


13.4.4 Finishing the proof 
Let’s take stock. 


e We’ve created circles of various sizes to find points in, and two lattices 
to examine. 


e The area of the circle is more than the area (4p) of the smallest parallel- 
ogram made by green dots. 
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To finish the proof, we need to find a blue point other than the origin 
interior to the bigger blue circle of radius \/2p. The gist of the argument splits 
into two parts. 

First, we will pursue Claim 13.4.11: 


¢ Because all points inside the parallelogram (not just green, blue, or lat- 
tice points) will “repeat” outside of it in another parallelogram, 4p is 
the biggest area of a region that you can have and not “repeat” some 
point. (This parallelogram is often called a fundamental region in 
more general treatments.) 


e So, the interior of the circle, having a bigger area, must have two points 
(not necessarily blue points, just points on the plane) which are “repeated” 
by translation of this parallelogram. 


We will expand on exactly what “repeat” means momentarily. 
Secondly, we show why the previous claim leads to a proof in Claim 13.4.12: 


e We start with the two points from Claim 13.4.11 in the disk bounded by 
the circle (points which are not necessarily on any lattice, blue, green, or 
even black). 


e Then we use elementary geometry to construct a blue point (namely, 
one of the form (a,af + bp)) which is strictly in the interior of the disk 
bounded by the circle of radius \/2p. In particular, this point is not the 
origin. 


The argument in Theorem 13.4.5 now finishes the proof. 
Let’s begin the final push to prove the two claims with a fact and a definition 
which explain what sort of points we are looking for. 


Fact 13.4.9 Let L be the parallelogram with vertices (0,0), (—2f,2), (2p, 0), 
and (2p + 2f,—2) and its interior (where f is a square root of —1 modulo p). 
Any plane region is the union of its intersection with all possible translations 
of L by rigidly moving L so that the origin is translated to another green point. 
Proof. We are not going to prove topological facts in this text, nor explore 
the further depths of lattices. So it suffices to note that every green point 
(2a, 2af+2bp) can serve as the leftmost vertex of a unique parallelogram not just 
congruent to, but translated from, LD, and that by construction these cannot 
overlap (other than possibly along their edges). a 


Definition 13.4.10 We say that two distinct points v, w in a plane region are 
“repeated” if they are both rigid translations of the same point in L, where the 
allowed translations are those described in Fact 13.4.9. ©) 

We now prove the two remaining claims to finish the proof of Theorem 13.4.5, 
after which we encourage the reader to explore the large interact in Exam- 
ple 13.4.13 which ends the section. 


Claim 13.4.11 Consider the circle of radius \/2p centered at the origin. The 
interior of the disk bounded by this circle has two points “repeated” by shifting 
the parallelogram L. 

Proof. Recall from Fact 13.4.9 that the disk is composed of all its intersections 
with different parallelograms congruent to LD. 

Suppose that there are not two points “repeated” within the disk (not including 
the boundary circle). Then every point thereof is a translation of a different 
point of Z. One can make this a one-to-one function from the disk to LD by 
sending each point in the disk to the corresponding one in L. 
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Because each such move is rigid, this function is area-preserving®, which means 
the area of the disk must be less than or equal to that of L. 

However, at the end of Subsection 13.4.3 we asserted the opposite! So by way 
of contradiction we have our two points. | 


Claim 13.4.12 Given two points v,w (in the interior of the circle of radius 
/2p centered at the origin) which “repeat” from L, we can construct a point, 
not the origin, of the form (a,af + bp). 

Proof. Given how we defined “repetition”, we know that the line segments 
from v and w to the leftmost vertex of their respective translations of Z must 
themselves be rigid translations of each other, hence the line segment connect- 
ing v and w can be translated to a segment connecting the origin and another 
green point. Give this point the name? v — w. 

Since v — w is of the form (2a, 2af + 2bp) by definition, then the point halfway 
between it and the origin (or “(u—w)/2”) is a blue point of the form (a, af +bp), 
and clearly not the origin since v— w itself is not the origin. It remains to show 
that this blue point is in the interior of the circle. 

To see this, consider the distance d between v and w. By definition of a circle, 
it cannot possibly be further than twice the radius, so d is strictly less than 
2,/2p. But then v — w cannot be more than d units from the origin, so the 
point (a,af + bp), being exactly half that distance from the origin, is less than 
distance \/2p to the origin. By definition (a,af + bp) is in the interior of the 
larger circle, as desired. a 
Example 13.4.13 In Figure 13.4.14 we see the picture of how Claims 13.4.11 
and 13.4.12 find the blue point in the circle. The black points are v and w, the 


arrows point between v and w and from the origin to v — w, and the midpoint 
of the second arrow is indeed blue. 


Figure 13.4.14 How to find the lattice point on the circle 


8If you looked at this footnote because you want a proof of this, recall we do not prove 
topological facts in this text! Next you’ll be wanting a proof of the Jordan curve theorem from 
first principles. More seriously, we have to draw the line somewhere, and I find pedagogically 
that students would find proving assertions of this kind similar to proving 1+ 1 = 2 using 
Russell and Whitehead as a text. Convincing students that proving Fact 1.2.2 is useful is 
hard enough. 

°Tn fact, as vectors of course this is the point, but we minimize formal use of vectors in 
this text. 
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Sage note 13.4.15 Examining code is good for you. The next Sage cell 
makes Figure 13.4.14 interactive. But don’t just use it to view the proof for 
other primes; examine the code itself. 

This is by far the longest code we’ve seen up to this point. It is a brute force 
check of all movements of all points in the parallelogram to find two points in 
the bigger circle. Can you think of ways to make it more efficient? 


@interact 
def _(p=(5,[£q for q in prime_range(200) if q%4==1])): 

f=Integer(mod( factorial ((p-1)/2),p)) 

big = math.floor(math.sqrt(2*p)) 

viewsize=2xp 

EX, y)=x*2t+y*2 

plot1 = implicit_plot(g-p, (-viewsize,viewsize), 
(-viewsize,viewsize), plot_points = 100) 

plot2 = implicit_plot(g-2*p, (-viewsize,viewsize), 
(-viewsize,viewsize), plot_points = 100) 

plot3 = Line([L0,0], [2*p-2*f,2], [2*p,0], [2«f,-2], 
[@,0]], rgbcolor=(1,0,0)) 

plot4 = lLine2d([[0,0],[2*p,0]], rgbcolor=(1,0,0), 
limestviles ==") 

grid_pts = [Li,j] for i in [-viewsize..viewsize] for j 
in [-viewsize..viewsize]] 

plot_grid_pts = 
points(grid_pts ,rgbcolor=(0,0,0),pointsize=2) 

Lattice_pts = [coords for coords in grid_pts if 
(coords[1]-fxcoords[0]) %p==0] 

plot_lattice_pts = points(lattice_pts, rgbcolor = 
(@,®,1),pointsize=10) 

big_lattice_pts = [[2*coords[0],2*coords[1]] for coords 
in lattice_pts] 

plot_lattice_pts2 = points(big_lattice_pts, rgbcolor = 
(@,1,%),pointsize=20) 


w= [] 
v= C] 
mw = [] 


for i in [1..2*p-1]: 
for coords in [l for l in big_lattice_pts if 
L!=[0,0]]: 
if (itcoords[@])*2+(coords[1]-1)*2 < 2xp: 
for coords2 in [k for k in big_lattice_pts 
if k!=[0,0] and k!=coords]: 
if (itcoords2[@])*2 + (coords2[1]-1)%*2 < 
2*p: 
w = [Litcoords[@],coords[1]-1] 
v = [Li+coords2[0], coords2[1]-1] 
vmw = [vl@]-wLl@],vLl1]-w[1]] 
break 
if w: break 
if w: break 
if not v: 
for iin [j for j in [f..ptf]]: 
for coords in [l for l in big_lattice_pts if 
L!=[0,0]]: 
if (it+tcoords[@])*2+(coords[1]-1)*2 < 2xp: 
for coords2 in [k for k in 
big_lattice_pts if k!=[0,0] and 
k!=coords]: 
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if (itcoords2[@])*2 + 
(coords2[1]-1)*2 < 2xp: 
w = [Lit+coords[@],coords[1]-1] 
v = Lit+coords2[@],coords2[1]-1] 
vmw = [vl@]-wLl@],vl1]-wL1]] 
break 
if w: break 
if w: break 
if not v: 
for iin [j for j in [p-f..2*p-f]]: 
for coords in [l for l in big_lattice_pts if 
L!=[0,0]]: 
if (it+tcoords[@])*2+(coords[1]+1)*2 < 2xp: 
for coords2 in [k for k in 
big_lattice_pts if k!=[0,0] and 
k!=coords]: 
if (itcoords2[@])*2+(coords2[1]+1)%*2 
< 2*p: 
w = [Li+coords[@],coords[1]+1] 
v = L[Lit+coords2[@],coords2[1]+1] 
vmw = [vl@]-wLl@],vl1]-wL1]] 
break 
if w: break 
if w: break 
P1=point(v, pointsize=20, rgbcolor=(0,0,0)) 
P2=point (w, pointsize=20, rgbcolor=(0,0,0)) 
Z=point (vmw, pointsize=20, rgbcolor=(0,0,0)) 
plot4 = arrow(w,v,rgbcolor=(0,0,0), thickness=1, 


linestyle='--', arrowsize=3) 
plot5 = arrow((@,0),vmw,rgbcolor=(@,0,0), thickness=1, 
linestyle='--', arrowsize=3) 


plot6 = point ((vmwLlQ]/2,vmw[1]/2) , pointsize=30) 

show(ploti1+plot2t+plot3+plot4 + P1+P2+Z+plot4+plot5+ploté6 
+ plot_grid_pts + plot_lattice_pts + 
plot_lattice_pts2, figsize = [5,5], xmin = 
-viewsize/2, xmax = viewsize, ymin = -viewsize/2, 
ymax = viewsize/2, aspect_ratio=1) 


Believe it or not, we’ve concluded the proof — whew! 
Why was this so hard? I can think of three reasons. 


e First, we are trying to prove something about squares by proving some- 
thing about square roots. It works, but it means there will be many 
steps. 


e Secondly, we are not just algebraically proving it exists by solving an 
equation; we are forced to prove our square root exists with inequalities, 
which brings another set of complications. 


e Third, we chose to examine those inequalities geometrically to gain in- 
sight, so our proofs must use that insight — worthwhile, but stretching. 


Historical remark 13.4.16 Hermann Minkowski. Many more theorems 
of this kind, such as Lagrange’s four square theorem, can be proved using 
similar techniques, which we are intentionally avoiding stating in their full 
generality. The names of Minkowski and Blichfeldt are associated with theo- 
rems using various symmetries and the notion of convexity in order to apply 
things more generally. Those who have had some physics may have heard of 


CHAPTER 13. SUMS OF SQUARES 232 


Minkowski before, as his work nearly beat Einstein to the notion of special 
relativity; his geometric framework for space-time gave Einstein the necessary 
apparatus to generalize to curved spacetime and general relativity. 


13.5 All the Squares Fit to be Summed 


There is one loose end. What are all the numbers we can represent as a sum 
of squares? 

For instance, why are some composite numbers of the form 44+1 not write- 
able as the sum of two squares? Also, many even numbers are representable — 
how do we tell which even numbers are writeable? We conclude our discussion 
by proving the full statement, after a couple of preliminary lemmas. 


Lemma 13.5.1 If N has only primes of the form 4k +1 and 2 as factors, it 
is writeable as a sum of two squares. 

Proof. Each of those primes is representable, so we can use Fact 13.1.9 to write 
all intermediate products as a sum of squares. Hence all such products are 
representable. a 


Example 13.5.2 Consider this: 


442 = 2-13-17 = (1? + 1°) (37+ 2?) -17 


= [@-3-1-2)? +(1-241-3)] (47 + 1”) 


= (17 +5?) (47417) =(1-4-5-1)7+:(1-145-4)? = 1? +217. 


Lemma 13.5.3 If the powers of prime factors of N of the form 4k +3 are 
only even powers, then N is writeable as a sum of two squares. 

Proof. First, p* (even if p is not prime) is trivially always representable, since 
p? = p? +07. Now, rather than using Fact 13.1.9, let P be the product of 
all prime factors of the form 4k + 3, which is necessarily a perfect square 
P = Q?, given that all the powers are even. We can simply multiply this by 
N/Q? = a? +b?, which is possible by Lemma 13.5.1 since Q? removes all primes 
of the form 4k + 3 in the prime factorization. This yields (aQ)? + (bQ)?. 1 


Example 13.5.4 Consider this: 
35802 = 442. 3* = (17 + 217) 3? . 3? 


18? BO 92? a 0 8" 


Theorem 13.5.5 N can be written as a sum of two perfect squares precisely 
if it has only even powers (including zeroth powers) of any primes of the form 


4k + 3. 
Proof. From 13.5.1 and 13.5.3, the only case left to consider if N has a prime 


of the form p = 4k + 3, but to an odd power. This seemed to be the bottleneck 
in our exploration. 
By way of contradiction, suppose that it is possible to write 


N =a? +67. 
First, divide this equation by any factors of p common to N, a, and b to get 


M=cC+d@ 
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The power of p we divided by (so that N = Mp‘) must be an even power, since 
each term on the right-hand side is a perfect square and can only contribute 
even powers of primes by the Fundamental Theorem of Arithmetic. 

Since N had an odd power of p, we know &/ still has an odd power of p dividing 
it, yet p{ c,d. 

Take everything modulo p to get the congruence 


0=c? +d? (mod p). 
Since p{c, we can multiply this congruence by (eo) to get 
O=1+ (Cas f@>--1= (c71d)” (mod p) 


This is a contradiction, as by Fact 13.3.2 there is no square root of —1 modulo 
p for p = 4k + 8, finishing the proof! a 


Example 13.5.6 This theorem fully explains why 21 = 7-3 and the others 
mentioned in Fact 13.1.2 cannot be written as a sum of squares. 


If the whole theorem still seems too neat and dried, it can be instructive to 
get insight by plugging in different n below. When do you get an error, when 
not? 


n=20 
print (factor(n)) 
print (two_squares(n) ) 


242 * 5 
(2, 4) 
(As a bonus, can you turn this into an interactive cell? See Sage note 12.6.8.) 


13.6 A One-Sentence Proof 


There is a completely different approach to this problem which has gained 
some notoriety. Often one wants multiple approaches in order to understand 
a problem more deeply; here, we have picked a geometric approach. 

It happens that D. Zagier provided the culmination of a series of proofs 
using only sets and functions, and that proof takes only one sentence to write 
down! This is reproduced from the famous article [E.7.2] with the following 
title: 


Proposition 13.6.1 A One-Sentence Proof that Every Prime p = 1 
(mod 4) is a Sum of Two Squares. 
Proof. The involution on the finite set 


S = {(a,y,z) € N* | 2? + 4yz = p} 
defined by 


(e+2z,z,y-"%-—2) ifa<y-z 
(z,y,2) > 4 (2y—2,y,c-yt+z) ify—z<a2<2y 


(a—2y,e-—yt+z2,y) ifa>2y 


has exactly one fixed point, so |S| is odd and the involution defined by 
(x,y, 2) > (a, z,y) also has a fixed point. | 
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In Exercise Group 13.7.19-—23, you will be asked to verify the various state- 
ments that this proof depends on. Although perhaps it is not the easiest single 
sentence after all, it is still fun — fun enough that you can watch a couple videos 
about it! from Numberphile! 


13.7 Exercises 


1. Prove that if m = 3 (mod 4), then n cannot be written as a sum of two 
squares (13.1.1). 

2. Prove Fact 13.1.2. 

3. Show that if mn = 7 (mod 8), then n cannot be written as a sum of three 
perfect squares. (See also Exercise 14.4.6.) 

4. Find two numbers that can be written as a sum of three squares in two 
essentially different ways (not just 17 + 0? + 0? = 0? + 1? + 0? or even 
37 + 4% 4 12 = 07 + 52 +1”). (See also Exercise 14.4.4.) 

5. Find as many integers n as possible which are only writeable as a sum of 
squares via n = a? + a? = 2a?, i.e. n is not writeable as a sum of distinct 
squares. 

Verify Fact 13.1.7 by hand (i.e. write all the algebra out). 

Let r2(n) be the number of different ways to write n > 0 as a sum of 
two squares, where every different way (not just essentially different) is 
counted. For instance, 


r2(2) = 4 because (—1, 1), (—1, -1), (1,1), (1, -1) all work. 


Prove that 
rp (2) =4 for all m > 1. 


Exercise Group. Let N be odd, and let N = a? +b? and N = c?+d?, where 
the pairs (a,b) and (c,d) are both positive and not the same or just switched 
in order. Verify the following to finish the proof of Fact 13.2.1. 
8. It’s okay to assume that a and c are odd and 6 and d are even, with 
a>candd> b. 
9.  Ifthis is the case, show that k = gcd(a—c, d—b) and n = gcd(a+c, d+b) 
are both even. 


10. Assuming the previous two exercises, show that “7S = ab and o> = 
11. Assuming everything else works, show that N is in fact the product of 
the terms in question; this will involve a fair amount of cancellation! 
12. Using the tools of this chapter, for each of the numbers 5095, 5096, 5097, 
5098, and 5099, either write it as a sum of two perfect squares or explain 
why it is impossible to do so. 
Exercise Group. Pick four random (to you) three digit numbers which are 
not of the form 4k + 3. 
13. Decide whether these numbers are a sum of two squares without using 
Sage. 


14. Pick two of those numbers and write them in all possible ways as a 
sum of two squares. 


10www. youtube. com/watch?v=yGsIw8LHXM8 
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15. 


16. 


17. 


18. 


Show a positive integer k is the difference of two squares if and only if 
k #2 (mod 4). 

Prove that if n = 12 (mod 16), then n cannot be written as a sum of two 
squares. 

Is there any congruence condition modulo 6 for which a number cannot 
be written as a sum of two squares? 


Referring to the proof of the main theorem (especially in Subsec- 
tion 13.4.3): Check that the pictures you get from some other primes 
with these lattices really work. 


Exercise Group. Check every piece of the Zagier proof (Proposition 13.6.1). 


19. The set S is finite. Try figuring out what S is for p = 5 or p = 13, the 


smallest such primes. 


20. Each (2, y,z) has exactly one of the three things to go to. 


21. The function in question is an involution. That is, if you take the 


output and apply the function a second time, you get your original 
(x,y,z) back (this is a little tougher). 


22. If (x,y,z) goes to (x,y, z) then it turns out that (a, y, z) = (1,1, pot) 


(you will probably need to use the definition of S for this, and remem- 
ber that we assume p = 1 (mod 4)). 


23. That if the map (a,y,z) > (a,z,y) has a point which is fixed (the 


24. 


output is same as input) then this, combined with the definition of S, 
means that p is writeable as the sum of two squares. 


Prove the assertion about -— (2)! in Remark 13.3.4. 


Summary: Sums of Squares 


This chapter examines the question of what numbers may be written as a 
sum of two perfect (integer) squares. 


ds 


6. 


First an exploration of the problem is in order, including a geometric 
interpretation and the famous identity Fact 13.1.7. 


In Proposition 13.2.4 we show that prime numbers may essentially only 
be written in one way as such a sum. 


Defining the square root of a number, modulo n, is the content of Defin- 
ition 13.3.1, which we then immediately use to find out when —1 has a 
square root. 


The proof of Theorem 13.4.5 that primes of the form 4k+1 can be written 
as a sum of squares is a real geometric treat. 


In the penultimate section we prove Theorem 13.5.5, which explains why 
even though 21 can be written as 44 + 1, it cannot be written as a sum 
of squares. 


We finish with A One-Sentence Proof of the main theorem of this chapter. 


The Exercises give practice filling in many of the smaller details of the proofs. 
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Chapter 14 


Beyond Sums of Squares 


There are many fascinating topics that sums of squares connect to. This chap- 
ter gives some interesting points of view on several. 


14.1 A Complex Situation 


14.1.1 A new interpretation 
Let’s see another to interpret sums of squares. Suppose first that, as before, 
n=a? +6". 


Then, if we let the symbol 7 stand for a (putative) square root of negative one, 
so that —1 = 77, we could legitimately factor the equation: 


n = a? — (i?b”) = (a + bi) (a — bi) 


Example 14.1.1 For instance, we could factor the prime number thirteen!!! 


print (3%2+2%2) 
print( (3+2*i)*(3-2*i)) 


13 
13 


It turns out that there is a beautiful connection between the theory of 
numbers representable as a sum of two squares and the following beautiful 
definition. 


Definition 14.1.2 Gaussian integers. The Gaussian Integers Z/i] may 
be defined as the set 
Zi] = {a+ bi | a,b € Z} 


This does assume that we can have such a symbol i with i? = —1; typically this 
is considered to thus be a subset of the so-called complex numbers, denoted 
C. % 


Historical remark 14.1.3 Carl Friedrich Gauss. These are named after 
our friend Gauss, who explored them a great deal, though others were at least 
incipiently aware of them. 

There are so many stories about Gauss that one can hardly know where to 
begin. The most-quoted one is his quick solution to summing the numbers from 
1 to 100 as a child; however, some of his most important work was in physics and 
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magnetism. As an adolescent he kept a fascinating notebook of stunning results. 
He also was one contributor to the beginnings of modern statistics, proved the 
fundamental theorem of algebra, helped survey a large part of Germany, and 
in his own way mentored a number of important mathematicians, including 
Eisenstein (see Section 17.2), Riemann (see Chapter 24) and Germain (see 
Subsection 11.6.4, and below). 

Gauss will come up again in Section 17.4 regarding solving congruences, 
and when we continue exploring prime numbers in Section 21.2. Annoyingly, 
he only published some of his many results (notably in number theory); most 
relevant here is that Gaussian integers is something he actually did publish 
about. 

If we bring back our lattice of integer points, we can think of such numbers 
as being points on the lattice, where the coordinate point (3,2) corresponds to 
3+ 27, one of the ‘factors’ of 13. I'll plot both ‘factors’ below. 


Figure 14.1.4 Factors of 13 as a Gaussian integer 


There are many amazing questions to ask about this, and wonderful con- 
nections to abstract algebra. For example, the factorization 


a® + b? = (a + bi)(a — bi) 


requires 2, a “square root of negative one” over the integers, so we shouldn’t 
be surprised that writing as a sum of squares has a connection with “square 
roots modulo n”. This connection is actually more direct than we have seen, 
and we will show some of it in the next section. 


14.1.2 Revisiting the norm 


How can we decide whether the verb “to factor” is legitimate to use in a given 
number system? In the Gaussian integers, the reason we can is that prime 
numbers can be defined for this new system as well. 
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Fact 14.1.5 Prime numbers in the Gaussian integers, or Gaussian primes, 
are of three possible forms: 


e Given a prime p € Z of the form p = 4n +3, +p € Z|] ts prime. 


e Given a prime p € Z of the form p = 4n +3, tp-i € Zi] is also prime. 


¢ Given a prime p € Z not of the form p = 4n+4+ 3, any factors a+ bi 
and a — bi in Z[i] corresponding to writing p = a? + b? are prime (recall 
Theorem 13.5.5). 

The last point can be confusing. Since a and b could be positive or negative, 
and may be distinct, it can be useful to think of the primes thus generated as 
a+ bi,tb+ai. In Figure 14.1.4 this means that not only 3 + 27, but also 
—3 + 21, 2+ 37, and —2 + 37 are all Gaussian primes. This is pelaked to the 
notion of associates in ring theory; see also the end of this subsection. 

Viewing these Gaussian primes is fun. Many authors have created beatiful 
graphics! such as the one in Figure 14.1.6. 


2 


Figure 14.1.6 Plot of Gaussian primes with coordinates less than 10 in ab- 
solute value 


You can keep exploring the beauty of this pattern in the following interact. 


@interact 
def _(viewsize=10): 
Lattice_pts = [[i,j] for i in [-viewsize..viewsize] for 
j in [-viewsize..viewsize]] 
plot_lattice_pts = 
points(lattice_pts ,rgbcolor=(0,0,0) ,pointsize=2) 
GG.<I> = GaussianIntegers() 
Gaussian_primes = [ x for x in lLattice_pts if 


TYou can even order serving napkins with them as the design online (search 
sannydezoete.nl for ‘primes’). The internet is amazing. 
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GG(x[L@]+x[1]*1I).is_prime() ] 
plot_Gaussian_primes = 
sum(Lpolygon(L(G(0]+1/2,G[1]+1/2), 
(GL0]+1/2,GL1]-1/2), (GL@]-1/2,G[1]-1/2), 
(GL@]-1/2,G[1]+1/2)],alpha=.6) for G in 
Gaussian_primes ]) 
show(plot_Gaussian_primes+plot_lLattice_pts, 
aspect_ratio=1) 
pretty_print(html("Plot_of_Gaussian_primes with. 
coordinates,.less_than_{@} in absolute. 
value". format (viewsize))) 


The basic reason this even makes sense is that we can use the Euclidean 
algorithm here. First, let’s use the same definition of norm as we used in 
Definition 13.4.4 for the points, so that N(x + iy) = 2? + y?. 


Example 14.1.7 The norm of 3 + 27 is 3? + 2? = 13 while the norm of 
13 = 13+ 0: is 169. 

The difference is that instead of saying simply that a = bqg+r for r < }b, 
we will need to compare the norms of r and b. Namely, you can write two 
Gaussian integers a and b as a = bq +7, where 0 < N(r) < N(b). Continue 
this process just as in Euclidean algorithm, and it ends by the Well-Ordering 
Principle to define gcd(a,b). In this case +1 and + are all possible stopping 
points if a and b don’t share a factor. 

Further, if g and h are “relatively prime” Gaussian integers (gcd(g, h) = +1 
or +7), then there are other such integers x and y such that gx + hy = 1. So 
we have a Bezout identity as well to play with. 

Computing with Gaussian integers this way is possible in Sage. 


ZZI.<I> = GaussianIntegers() 
(1+1).is_prime() 


True 


Crucially, I am skipping whether we actually have unique factorization 
in Z|2]. This is true, and is used below in Fact 14.1.8, but properly belongs in 
an abstract algebra course. 


14.1.3 A different approach to sums of squares 


The Gaussian integers allow a quite different approach to the fact primes of the 
form 4n+1 can be written as a sum of squares. We could use complex numbers 
instead of geometry. Unfortunately, it requires us to take an algebraic fact on 
faith instead of the fact we proved using geometry; there are no shortcuts. Still, 
it’s worth looking at. 


Fact 14.1.8 If p= 1 (mod 4) is prime, then p can be written as a sum of two 
squares. (This is Theorem 13.4.5.) 
Proof. We already know, from the proof of Lemma 13.3.3 that 


r= (4): 


is a square root of —1 modulo p. But now, instead of doing geometry, let’s look 
at what that means. 
By definition of 

f? =-—1 (mod p) 
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we know that p| f? +1. Since f?+1 is f? — 7?, let’s factor: 
fi+1=(f +a (Ff -9. 


Clearly p = p+0i does not divide either f +i or f —7 evenly in Z[¢], but it does 
divide their product. So (crucially!), if we assume the Fundamental Theorem 
of Arithmetic still holds for Gaussian integers, then p factors in Z[i] and has 
a prime divisor of the form a + bi (in the sense of Subsection 14.1.2) dividing 
ft+ior f-7. 

Given that a+ bi | p, it’s not hard to show that then a — bi also must divide p. 
We'll skip this (but see the discussion after Fact 14.1.5 for ideas). 

To finish up, combine these facts to see that 


(a+ bi)(a — bi) | p? > a? +B? | p? 


and the factor a? + b? is not equal to one, since a + bi was a proper divisor of 
p. Since p is an integer prime, the only possibility is 


a? +b? =p. 


| 
To emphasize that the assumption about Theorem 6.3.2 really matters, see 
Exercise 6.6.30. 
Remark 14.1.9 As a final note to the complex point of view, one may note 
that there is a way to view Pythagorean triples as Gaussian integers as well. 
In this case one notes that if a? +b? = c?, then a+bi could represent the triple 
in question, and moreover one can use Fact 13.1.7 to combine two such triples. 
Most remarkably, a variant of this operation applied to primitive triples can 
be used to put a group multiplication on that set! See [E.7.29] for more details, 
such as the multiplication involved and the structure of the group, which an 
inquiring reader may wish to relate to Remark 3.4.8 and similar facts. (See 
also Exercise 15.7.21.) 


14.2 More Sums of Squares and Beyond 


There are many interesting questions one can ask about sums of squares we 
have not even touched upon. Each of these is very worthy of independent study 
by undergraduates, and also ideal for computer exploration. 


14.2.1 Summing more squares 


Fact 14.2.1 Sums of three squares. A positive integer may be written as 
a sum of three squares if and only if it does not have the form of a product of 
an even power of two times an odd number which is congruent to seven modulo 
eight. 
Proof. We will skip the proof, but see Exercise 14.4.4 and Exercise 14.4.6. 1 
One might think at this point that even an arbitrary sum of squares might 
not represent every number, but we have this result (see also Exercise 14.4.7), 
first conjectured by our old friend Bachet. 


Fact 14.2.2 Lagrange’s four square theorem. Any nonnegative integer 
may be written as a sum of four squares. 

Proof. There are algebraic proofs using facts similar to Fact 14.1.8, and also 
geometric proofs using (Minkowkskian, see Remark 13.4.1) ideas similar to 
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those in Subsection 13.4.4. Both types of proof are interesting, because on the 

one hand an algebraic proof can use the extension of the complex numbers 

called the quaternions’, while on the other hand a geometric proof shows 

that geometric ideas can still work in more than two dimensions. a 
One can generalize in many ways. 


Example 14.2.3 For example, one can ask how many ways one can write a 
number as a sum of three, four, etc. squares. In Exercise 13.7.7 we defined 
ro(n) as giving the number of ways to write n as a sum of two squares; the 
equivalent functions here would be r;(n) for n > 1. In that case, Lagrange’s 
four square theorem above could be more succinctly stated as 


ra(n) > 1 for alln >0 


But in general one may want to be able to compute this, or to give bounds for 
it as a function of n. If you just can’t wait to learn more about the sort of 
things known about rz(n), see Theorem 25.8.1. 


14.2.2 Beyond squares 
There are other directions one can generalize our questions. For instance: 


Question 14.2.4 What numbers can be written as a sum of ... 
e Two cubes? 
e Three cubes? 


e k cubes? 


It turns out that any number can be written as a sum of at most nine 
cubes. In the first half of the twentieth century, American mathematician 
L. E. Dickson proved this, and with the assistance of very substantial tables 
generated by hand by some of his assistants (before the advent of the digital 
computer!) he showed that every number except 23 and 239 can be represented 
by eight or fewer cubes! 

Alternately, one could keep the number of powers the same, but change the 
powers. 


Question 14.2.5 What numbers can be written as a sum of ... 
e Two cubes? 
e Two fourth powers? 


e Two nth powers? 


The reader should feel free to explore this in Exercise 14.4.8. Note that the 
answers for odd powers will be very different if one allows negative numbers! 
For a recent example of theory working with massive computation, see this 
article? about writing 33 as a sum of three cubes’. 


2See an excellent video (www. youtube.com/watch?v=d4EgbgTmeBg) by 3bluelbrown (Grant 

Sanderson). 
3www. quantamagazine. org/sum-of-three-cubes-problem-solved-for-stubborn-number- 33-20190326/ 
4To be precise, 8866128975287528% + (—8778405442862239)? + (—2736111468807040)? = 

33. The status of all positive integers less than 100 is now known; see Quanta magazine 

(www. quantamagazine.org). 
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Now it is time to recall our discussions in Section 3.4, alluded to in Re- 
mark 13.1.4. In that situation, we essentially were looking for integer solutions 


to 


gt ty? = 2? 


In fact, we characterized such triples x, y, z in Theorem 3.4.6. 

But we can reinterpret this as a question in this context — when is a perfect 
square a sum of two squares? In that case, the previous question can be further 
specialized: 


Question 14.2.6 What perfect ... 
e Cubes can be written as a sum of two cubes? 
e Fourth powers can be written as a sum of two fourth powers? 


e¢ What about nth powers? What (integer) solutions are there to this? 


er +y" =z 


Ordinarily, as author I would now send the reader to explore some of these 
questions in Exercise 14.4.9. However, as we saw in Exercise 3.6.17 (see the 
discussion at Corollary 3.4.13), Fermat already proved that other than trivial 
solutions (such as writing 04 + (—1)* = 14) there were no solutions in the case 
n = 4. This is the simplest case of Fact 14.2.7. Euler nearly proved the same 
statement for n = 3, but made a hidden assumption — the same one we will 
examine shortly in discussing Fact 15.3.5 (as there, see [E.4.14] for a correct 
proof). 

There is a huge field which developed from these observations, but we 
will not digress much further upon it. If you recall the discussion in Subsec- 
tion 11.6.4, it turns out Germain originally investigated n in the case where it 
is one of the numbers now known as Germain primes (recall Subsection 11.6.4); 
see [E.5.2, Chapter 11] for an accessible introduction to her plan. 

Much of the field of algebraic number theory developed from pursuing this 
question in the nineteenth and early twentieth centuries. Finally in 1995 An- 
drew Wiles, along with his former student Richard Taylor, proved the following 
result via a very deep investigation of (among other things) elliptic curves (re- 
call the brief mention in Section 3.5). 


Fact 14.2.7 Fermat’s Last Theorem. Forn > 2, there are no three positive 
integers x,y,z such that 

g” + y” = 2h 
Proof. Hanc marginis exiguitas non caperet. a 


14.2.3 Waring’s problem 


The English mathematician Edward Waring® asked for an outrageous gener- 
alization of these questions of sums of powers, which is still an active area 
of research called Waring’s Problem. The most important result is truly 
spectacular. 


Fact 14.2.8 Hilbert-Waring Theorem. For each positive integer power 
m, there is a number g(m) such that every nonnegative integer can be written 
as a sum of g(m) mth powers. 


> According to [E.5.3, Section 11-1], John Wilson of Wilson’s Theorem was his student. 
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There is even a potential formula that 


wees] 


This has been verified for m out to many millions, and is conjectured to always 
be true. The aforementioned Dickson® notes that this formula was first con- 
jectured by Euler’s son, Johann Albrecht. See {[E.2.16, Section 7.6] for a nice 
exposition of this, and see [E.4.26, Chapter 5] for Fact 14.2.8 itself. 

On the other hand, the question of finding the smallest integer G(m) (for 
a given m) such that every sufficiently large number can be written as a sum 
of that many mth powers is still wide open. Perhaps you will explore it? (See 
e.g. Exercise 14.4.10 and Exercise 14.4.11.) 


14.3 Related Questions About Sums 


There is yet another generalization that will serve better as a lead-in to the 
next chapters. Think about the following two problems. 


¢ What numbers can be written as x? + 2y?? (Think of it as 27 + y? + y?.) 
¢ What numbers can be written as x? + 3y?? 


These are very natural generalizations to the “two squares” question. How 
could we approach them? Here’s one type of idea. 


Fact 14.3.1 No number 
n=5 orn=T (mod 8) 


can be written as x? + 2y?. 
Proof. Try all numbers modulo 8 and see what is possible! (See Exercise 14.4.3.) 
| 
Already Fermat (unsurprisingly) claimed a partial converse to Fact 14.3.1. 
He stated that any prime number p which satisfies p = 1 or p = 3 (mod 8) 
could be written as a sum of a square and twice a square. 
This time, Euler wasn’t the one who proved it! But you could almost 
imagine that by factoring 


a+ Qy? =(a#- V2iy) (a+ v2iy) 


you could start proving such things. When might a square root of two exist 
modulo p ... 
Here are some numbers which can be written in this form. 


@interact 

def _(n=10): 
pretty_print (html ("Using _$a$_and_$b$_up_to_$%s$:"%n) ) 
L=[La*2+2*b*2 for a in [0..n] for b in [0..n]] 
L.sort(); print(L) 


In Exercise 14.4.12, you will try to discover a similar pattern for x7? + 3y?. 
See also Section 15.4. 


Swww.ams.org/journals/bull/1936-42-12/50002-9904- 1936-06432-3/ 
$0002-9904-1936-06432-3. pdf 
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14.4 Exercises 


13. 


Look up the concepts of ‘Gaussian moat’, ‘Gaussian zoo’, and/or ‘Gauss- 
ian prime spiral’ and tell what you think! 

Look up ‘Eisenstein integers’. Can you find any interesting theorems along 
these lines which they prove? What would Eisenstein primes look like? 
What about “Eisenstein triples”? (See [E.7.17] and Exercise 3.6.20.) 
Finish proving Fact 14.3.1. 

Find numbers writeable in two essentially different ways as a sum of three 
squares (not just 17+0?+0? = 0?+1°+0? or even 37+4?41? = 0?+57+17). 
(This was also Exercise 13.7.4.) 

Show that two (separate) instances of Pythagorean triples can yield an 
answer to the previous exercise in a clever way. (Thanks to Samuel Pa- 
quette.) 

Show that an odd number which is congruent to seven modulo eight 
may not be written as a sum of three squares, obviously without using 
Fact 14.2.1. (This was also Exercise 13.7.3.) 

Research Lagrange’s four-square theorem and write an essay about it; 
which proof do you prefer? 

Write a program in Sage (or another language) to explore which numbers 
may be written as a sum of two cubes, two fourth powers, and so forth. 
Write a program in Sage (or another language) to verify Fermat’s Last 
Theorem for some small x,y,z and n. 


. Write a program in Sage (or another language) to compute g(m) and/or 


G(m) in the Hilbert-Waring Theorem for small m. 


For which m do results in this chapter give us information about g(m) or 
G(m)? Be as specific as possible. 

Look for a pattern, similar to the one we found for sums of squares, for 
which primes can be written in the form x? + 3y?. Prove that the primes 
not of this form are impossible. 


Yet another possible generalization of Pythagorean triples is to ask when 
the sum of two perfect powers of the same degree is a perfect square, or 
x” +y" = z?. Explain why this is not so interesting when n is even, 
and why when n = 3 we already have seen at least one solution. Then 
do some experiments to conjecture whether there are solutions for prime 
n > 3. (See [E.4.23, p. 255].) 


Summary: Beyond Sums of Squares 


In this chapter, we examine some optional (but amazing) additional ques- 


tions and directions the sums of squares can take us. 


1. Gauss reimagined many questions. His introduction of Gaussian integers, 


a complex-valued analogue to our integers is not just related to sums of 
squares, but provides its own interesting questions, such as those of what 
complex prime numbers might look like in Fact 14.1.5. 


. The next section includes brief discussion of the topics surrounding three 


amazing facts — Lagrange’s four square theorem, Fermat’s Last Theorem, 
and the Hilbert-Waring Theorem. 


3. We prepare to think about other sum questions that could be interpreted 
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geometrically in Section 14.3. 


In keeping with its overall feel, the Exercises have more programming exercises 
than usual, and some exploration. 


Chapter 15 


Points on Curves 


We have already seen a lot of the geometric viewpoint of number theory; think 
about Section 13.4, for instance. 

The goal of the next several chapters is to examine what other questions 
can one ask of a purely geometric nature — or how far geometry can go in 
answering other questions. 

This chapter returns to the notion of finding specific types of points on 
graphs of number-theoretic equations. But instead of looking at lines as we 
did before, there are a variety of curves we can consider. 

For instance, our previous discussion about the sum of two squares was 
essentially interpreted as asking when the curve 2? + y? = n has an (integer) 
lattice point on it or not. We have completely answered this question. 

But if we were considering x? + y? = n to be about a circle of radius \/n, 
then x? + 2y? = n must be about an ellipse! Here is a visualization of points 
on a couple of these ellipses. 


34. . . . . ° . 34. 

24° . . . . . . 24° 

ee ; : Joa. 

o4. 0 

1 1 

2 : 2 

34; i i i ; i 7 34 ; i i i i i 
3 2 1 0 1 2 3 3 2 1 0 1 2 3 


Figure 15.0.1 Integer points x? + 2y? =n for n = 3,5 


Notice that one of them has integer points, while the other does not. Try 
more below. 


var('x,y') 
@interact 
def _(n=3): 
plotl=implicit_plot(x*2+2*y*2-n, (x,-n,n), (y,-n,n), 
plot_points=100) 
grid_pts = [Li,j] for i in [-n..n] for j in [-n..n]] 
plot_grid_pts = 
points(grid_pts,rgbcolor=(0,0,0) , pointsize=2) 
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Lattice_pts = [coords for coords in grid_pts if 
(2* coords[1]*2+coords[@]*2)==n] 

plot_lattice_pts = points(lattice_pts, rgbcolor = 
(@,®,1),pointsize=20) 

show(plot1 + plot_grid_pts + plot_lattice_pts, 
figsize=[5,5],aspect_ratio=1) 

pretty_print (html ("The ellipse_$x*2+2y*2=%s"%n) ) 


Questions like this are at the heart of modern number theory — plus, there 
are such nice pictures! It turns out this investigation will have surprising 
connections to calculus and group theory too. 

With that in view, you may want to try to find integer points on the 
following curves. Each exemplifies a type we will discuss in this chapter. 


1. 2 =a +2 
2. oc? +2y? =9 
3. a7 —2y*? = 1 


What we will do is to slowly try to make our way to finding integer solutions 
to some more difficult Diophantine equations, using an idea about rationals 
which simplifies Pythagorean triple geometry. We’ll then return to the integer 
setup once we’ve gotten this background. 


15.1 Rational Points on Conics 


15.1.1 Rational points on the circle 


Remember that in Section 3.4 we thought of Pythagorean triples as solutions 
to 


x +y? =e 


Now, let’s divide the whole Pythagorean thing by z?: 


2 2 2 
x x 
+5-12 (5) + (2) =e 
Zz Zz Zz Zz 
Since we can always get any two rational numbers to have a common denomi- 
nator, what that means is the Pythagorean problem is the same as finding all 
rational solutions to the simpler formula 


a? +7 =1, 
which seems to be a very different problem. Let’s investigate this. 


var('x,y') 
@interact 
def _(slope=-2/3): 
plotl=implicit_plot(x*2+y%2-1, (x,-1.5,1.5), 
(y,-1.5,1.5), plot_points=100) 
plot2=plot(slope*(x-1),x,-1.5,1.5) 
plot3=point (((slope*2-1)/(slope*2+1), 
-2xslope/(slope*2+1)), rgbcolor=(1,0,1), 
pointsize=20) 
show(ploti+plot2+plot3 + point((1,0), rgbcolor=(0,0,0), 
pointsize=20), figsize=[5,5], aspect_ratio=1) 
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In the interact above, the blue line intersects the circle x? + y? = 1 in the 
point (1,0) and has rational slope denoted by slope. If you change the variable 
slope, then the line will change. 

It is not a hard exercise to see that the line through two rational points on 
a curve will have rational slope, nor what its formula is, so that every rational 
point on the circle is gotten by intersecting (1,0) with a line with rational slope. 
This is not necessarily visible in Figure 15.1.1! 


1.5 4 


0.5 4 


-0.54 


-1.54 
-1.5 -1 -0.5 0) 0.5 1 15 


Figure 15.1.1 Intersecting a circle with a line of slope —2/3 


It is a little harder to show that intersecting such a line with the circle 
always gives a rational point, but this is also true! It is also far more useful, 
as it gives us a technique to find all rational points and hence all Pythagorean 
triples. 

Fact 15.1.2 All lines with rational slope through (1,0) intersect the unit circle 
in a second rational point. 

Proof. In fact, we can do even better than prove this; we can get a formula for 
the points. 

First, any line with slope ¢ has formula y = t(a — 1). We can then obtain all 
intersections with the circle 2? + y? = 1 by plugging in y, so: 


a? + (te —1))? =15 2? +2? — 22? +2 =1 


We will skip the algebra (see Exercise 15.7.1) showing that the quadratic for- 
2 


mula yields the two answers 4=+. 


Note that ae = 1 gives the point (1,0) which we already knew. The other, 
new, point is oot = x; plugging this in gives y = t (G4 1) = gi In 
summary, every rational slope t gives us the point (S34, a). a 


Even the inputs t = 0 and t = oo have an appropriate interpretation in this 
framework. Such a description of the (rational) points of the circle is called a 
parametrization. Plug in various t and see what you get! 


Remark 15.1.3 You could start the whole process with (—1,0) or (0,1), use 
all lines through it with rational slopes, and get a different parametrization. 


CHAPTER 15. POINTS ON CURVES 250 


15.1.2 Parametrization in general 


But will this always work? Certainly not every curve gets rational points by 
intersecting rational slope lines with it. 


Example 15.1.4 Consider the curve given by y = x? and the point (0,0). A 
rational slope line through that point would be y = Fa. Substituting we get 


Pea a8 Be 0 2(2-2") =0 
q q q 


which clearly will have irrational x-coordinates for most choices of the slope 
p/4.- 


In the quadratic context it works, though! Here is an amazing fact we will 
not prove. 


Fact 15.1.5 Suppose you have a curve given by a quadratic equation with 
rational coefficients which contains at least one rational point. Then all lines 
with rational slope (including vertical’ lines) through that point on the curve 
intersect the curve in only rational points, and all rational points on the curve 
are generated in this way. 


Example 15.1.6 Here’s an example with x? + 3y? = 1. 

As in the proof of Fact 15.1.2, the line going through (1,0) has equation 
y =t(x—1). Here, the ellipse has equation x? + 3y? = 1, so that we must solve 
the equation 


ag? + 3¢?(2 —1)? =1=> 2? + 3t?2? — 62 + 347 -1=0 


for x to find a parametrization of x in terms of t. Figure 15.1.7 might help 
visualize the process. 


lThe long reason for this is projective space; the short and not-quite-rigorous reason is 
that co = 1/0 is a rational fraction, right? ... Right? 
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1.55 


0.54 


-0.545 


-1.5 5 


Figure 15.1.7 Intersecting an ellipse with a line of slope —1/2 


Solving this equation seems daunting. Here are two strategies (see Exer- 
cise 15.7.2 to try them). 


e We already know that there is a solution x = 1, so that x — 1 must be a 
factor of the expression! So we could factor it out if we wished. 


e Alternately, we could use the quadratic formula and discard the solution 
r=1. 


In either case you should get 


3-1 = -2 
ne ae re 


Now you can find all kinds of interesting solutions like (#4, +4). 


13° 13 

Where does this go? One place these solutions lead is to integer solutions 
of three-variable equations. In the previous example, since x and y have a 
common denominator, we can just multiply through by the square of that 
denominator to get 


11? + 3(—4)? = 137. 


One could consider this to be an integer point on the surface given by x7+3y? = 
z?, which you may play around with in the following interact if you are online. 


weile( x7) 
@interact 
def _(viewsize=15): 
plotl=plot3d(sqrt(x*2+3*y%*2), (x,-viewsize , viewsize), 
(y, -viewsize/2, viewsize/2)) 
grid_pts = [[i,j,k] for i in [-viewsize..viewsize] for j 
in [-viewsize..viewsize] for k in [0..viewsize]] 
Lattice_pts = [coords for coords in grid_pts if 
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(coords[Q]*2+3*coords[1]*2==coords[2]*2) ] 
plot_lattice_pts = point3d(lattice_pts, rgbcolor = 

(1,0,0),pointsize=40) 
show(plot1+plot_lattice_pts) 


That is a rather non-obvious solution to this equation in three variables, to 
say the least, and only one of many that this method can help us find. 


15.1.3 When curves don’t have rational points 


However, the rational slope method does not always work. Namely, you need 
at least one rational point to start off with! And what if there isn’t one that 
exists? It turns out that Diophantus already knew of some such curves. 


Fact 15.1.8 The circle x? + y? = 15 has no rational points. 
Proof. First, note this is a much stronger statement than what we already 
know, which is that this curve has no integer points (see Fact 13.1.1). The way 
to prove this is to correspond rational points on the circle to integer points on 
the surface x? + y? = 152. 
Every rational point on the circle can be written using a common denominator 
as (p/q,r/q) for some p,r,g € Z, where we cancel any common divisor of all 
three numbers. Then simply multiplying through by q gives integer points 
(x,y,z) = (p,r,q) on the surface. (This isn’t a one-to-one correspondence, as 
the surface point (0,0,0) shows.) 
But now consider the whole equation p? + r? = 15q? modulo 4. The reader 
should definitely check that there are no legitimate possibilities! (See Exer- 
cise 15.7.5; don’t forget that the rational points are written in lowest terms.) 
| 
As we can see experimentally in the interact below, there are no rational 
points on a circle of radius /15 because there are no integer points on the 
corresponding surface other than ones with x,y = 0 — and those correspond 
to z = 0, which would give a zero denominator on the circle. Here is a place 
where rational points are illuminated by questions of integer points rather than 
vice versa. 


var('x,y') 
@interact 
def _(viewsize=15): 
plotl=plot3d(sqrt(x*2+y%*2)/sqrt(15), (x,®,viewsize), 
(y ,9, viewsize) ) 
grid_pts = [[i,j,k] for i in [0..viewsize] for j in 
[@..viewsize] for k in [0..3*viewsize]] 
Lattice_pts = [coords for coords in grid_pts if 
(coords [@]*2+coords[1]*2==15*xcoords[2]*2) ] 
plot_lattice_pts = point3d(lattice_pts, rgbcolor = 
(1,0,0),pointsize=40) 
show(plot1+plot_lattice_pts) 


Let’s do another example. 


Example 15.1.9 Try to find rational points on the ellipse 2x? + 3y? = 1. 


Solution. A rational point would correspond to integer points on 22?+3y? = 


z?. You can try looking at it modulo four, but that goes nowhere. Instead, 


given the three as a coefficient, look at it modulo 3! 
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In this case it reduces to 


2 = (za—')? (mod 3) 


This is impossible since [0], [1], [2] all square to [0] or [1] in Z3. 


The point is that, at least sometimes, modular arithmetic and going back and 
forth between integer and rational points helps us find points, or prove there 
are no such points. 


15.2 A tempting cubic interlude 


It is interesting that our investigation of rational points, initially motivated by 
integer points like Pythagorean triples, inevitably led back to integer points. 
Soon we will look at some remarkable properties that sets of integer points on 
certain curves have, and whether any such points even exist. 

But before moving on, it is worth looking at some interesting tidbits relating 
to another type of equation, 2° + ay? = b. 

For the first example, consider that sometimes mathematicians like to ex- 
plore hard questions for their own sake. Sometimes proofs are very challenging, 
indeed. Then again, sometimes a very easy proof is missed. 

One example of this is the equation x? — 117y? = 5. At one point a well- 
known number theorist specializing in Diophantine equations asserted this was 
known to have few solutions. A few years later, using field theory, this was 
proved. 

Two years later, a note was published in an obscure Romanian journal show- 
ing that if one reduces the original equation modulo nine, a simple congruence 
is obtained which one can show has no solutions just by trying all possibilities 
by hand (you can try it in Exercise 15.7.6). (See this MathOverflow question? 
for background.) 

Another interesting story related to this is that of Henry Dudeney’s “Puzzle 
of the Doctor of Physic”, related by Andrew Bremner of Arizona State Univer- 
sity in [E.7.15]. Dudeney was one of the most famous puzzle constructors of a 
century ago, and this puzzle is a doozy. 


Question 15.2.1 Find the (rational) diameters of two spheres whose combined 
volume is that of two spheres of diameters one foot and two feet. 


This is equivalent to finding rational points on the curve «° + y? = 9. The 
puzzle itself gives the points (1,2) and (2,1), so the question is whether one 
can find any other such points. Bremner takes the reader through a geometric 
tour of trying to intersect this curve with various lines with rational slope in 
the hope of finding a proper solution to this problem. 

Figure 15.2.2 gives a potential first step, using the tangent line to the curve 
at (2,1). 


2mathoverflow. net/questions/42512/awfully-sophisticated-proof-for-simple-facts 
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3 m2 -1 0 1 2 3 


Figure 15.2.2 Finding a rational point on Dudeney’s curve 


It turns out that this point is not acceptable as a solution (why?). In fact, 
it takes several more steps of connecting points to arrive at a solution, namely 


415280564497 676702467503 
348671682660’ 348671682660 


which does seem a bit excessive but sure is fun?. 

There are endless variations on such questions. If we consider Dudeney’s 
problem as an example of summing two perfect squares to make a perfect cube, 
we have a more general question that Diophantus and al-Karaji explored for 
their rational rational solutions. 

We are now ready to begin our discussion of more integer points on curves. 
As mentioned before, we’ll try to find integer points on the following types of 
curves: 


e x? = y? + 2 (sometimes called the Bachet equation) 
e x? + 2y” = 9 (a well-known friend, the ellipse) 


° x? — 2y? = 1 (a hyperbola with surprising connections to V2) 


3For an even more fun puzzle that swept the internet a few years back, search quora.com 


for an answer to a question about ‘how do you find the positive integer solutions’ to Fie + 


+ —4— =4, based on a paper by Bremner and Macleod. 


y 
z+a xa+y 
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15.3 Bachet and Mordell Curves 


Let’s start by talking about x3 = y? + 2 as a type of curve. Recall from 
Historical remark 3.5.3 that Bachet de Méziriac first asserted this had one 
positive integer solution in 1621, very early in the development of modern 
number theory. 


Example 15.3.1 What is that solution? (Even if you don’t remember, you 
should be able to find it quickly.) 

Recall also that Fermat, Wallis, and Euler also studied this equation and 
gave various discussions and proofs of the uniqueness of its solution. As we 
first saw in Section 3.5, this equation is actually one of a more general class of 
equations called the Mordell equation: 


geoy +k, kez. 


Historical remark 15.3.2 Louis Mordell. Louis Mordell was an early 20th- 
century American-born British mathematician. He proved some remarkable 
theorems about this class of equations. We have already seen that these are 
nontrivial, and that some have no solution (Proposition 7.6.3, or see below 
Fact 15.3.3). Even deciding whether there are no solutions or not turns out to 
be quite tricky; Helmut Richter has a somewhat old website* with some tables 
of what is known about integer solutions. 

Notice that Mordell’s set of curves are not quadratic/conic, but rather a set 
of cubic curves. Actually, as mentioned before, they are examples of a special 
type of elliptic curves, which makes them more mysterious (and, as it happens, 
more useful for cryptography — we allude to this briefly in Subsection 11.5.1). 

One of Mordell’s remarkable theorems states that, for a given k, the equa- 
tion can only have finitely many integer points (in fact, there are even useful 
bounds for how many that depend only on the prime factorization of k). At 
the same time, Mordell curves are apparently “simple” enough that they can 
still have infinitely many rational points (see Theorem 15.3.6). Gerd Faltings 
won a Fields Medal for proving that higher-degree curves cannot have infinitely 
many rational points. If you are online, see which points you can find in the 
interact below. 


var('x,y') 

@interact 

def _(k=(2,[-15..15]), viewsize=10): 
E(x, y)=x*3-y*2 


plotl = implicit_plot(g-k, (-viewsize,viewsize), 
(-viewsize,viewsize), plot_points = 100) 
grid_pts = [[i,j] for i in [-viewsize..viewsize] for j 


in [-viewsize..viewsize]] 

plot_grid_pts = 
points(grid_pts,rgbcolor=(0,0,0) , pointsize=2) 

Lattice_pts = [coords for coords in grid_pts if 
(coords[@]*3-coords[1]*2==k) ] 

plot_lattice_pts = points(lattice_pts, rgbcolor = 
(@,®,1),pointsize=20) 

show(plot1+plot_grid_ptst+plot_lattice_pts, figsize = 
[5,5], xmin = -viewsize, xmax = viewsize, ymin = 
-viewsize, ymax = viewsize) 

pretty_print (html ("Integer_points.on_the_Mordell. 
equation_$x*3=y*2+%s$_in_this_window"%k) ) 


4hr.userweb.mwn.de/numb/mordell. html 
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15.3.1 Verifying points don’t exist 


Proving things about Mordell’s equation is quite tricky, but once in a while 
there is something you can do. For instance, we can verify something we can 
see in the interact above. 


Fact 15.3.3 There are no integer solutions to x? = y? — 7. 


Proof. Recall that we nearly finished the proof of this in Proposition 7.6.3! We 
had reduced to showing that 


y? +1=(2+2)(2? — 22 +4) 


was impossible if no prime of the form p = 4n +3 could divide y? + 1. 

This is not possible, because Fact 13.3.2 implies there are no square roots of 

—1 modulo p for this type of p. | 
Fact 15.3.3 is a simple version of the following far more general statement. 


Theorem 15.3.4 If the following hold: 
¢ M =2 (mod 4), 
e N=1 (mod 2), and 
e all prime divisors p of N are of the form 4k +1. 


Then there ts no solution to 


x = y" _ (Me ="), 
Proof. The proof basically follows the same outline as Proposition 7.6.3 with 
Fact 15.3.3. See Exercise 15.7.8. | 

One can prove lots of similar statements using only congruence consider- 
ations®. The previous theorem is [E.4.9, Theorem 14.1.2], and that text has 
several other interesting variants. See Conrad’s notes® and [E.2.8, Theorem 
7.4C.1] for even more special cases. See Subsection 17.5.4 for some other ex- 
amples (without proof) of how knowing when square roots exist helps solve 
Mordell equations. 

But there is a larger point to make, based on the very specific conditions 
on M and N. Namely, if we want to prove anything about such equations with 
methods we currently have access to in this text, we have no hope of getting 
any interesting general results. 


15.3.2 More on Mordell 


Let’s see what I mean by “no hope” here by returning to Bachet’s original 
equation, 2° = y? +2. What are some naive things we can say? 


e It should be clear that x and y must have the same parity. 


¢ If they are both even then x? is divisible by 4, but y? + 2 = 2 (mod 4), 
which is impossible. 


e So z and y are both odd. 


5As one might expect, with more power more can be done. See [E.2.16, Section 11.6] or 
[E.4.9, Section 14.2] for results using the class number from Remark 13.3.4. 
Swww.math.uconn. edu/~kconrad/bLurbs/gradnumthy/mordeLLeqn1 . pdf 
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That doesn’t really narrow things down much. 
Now, Euler nearly proves the following fact. 


Fact 15.3.5 The only positive solution to the Bachet equation is x = 3,y = 5. 
Proof. Proving this is already a little sophisticated, and is closely connected to 
the use of complex numbers in Section 14.1. Here we will give the idea behind 
Euler’s ‘proof’. 

In examining a? + 67, we factored it as (a + bi)(a — bi) using a square root of 
negative 1 (relative to Z). Similarly, we would like to factor the y? + 2. But it 
can’t be done in Z{i]. 

Instead, we could try to use the square root of —2, and define 


Z[|V—2] = {a + bV/—2 | a,b € Z} 


Then — 
P= (y- VB) vt VB 


We haven’t done anything with cubes yet ... 

Here is the tricky bit. In the integers, if 2? = pq and gcd(p,q) = 1, then p 
and q must both be perfect (integer) cubes. So Euler assumes this works in 
Z{\/—2] as well, and that the factors of y? + 2 are “coprime” (whatever that 
means in this new number system). (A very nice discussion of this is in [E.4.14], 
including a full proof in its appendix.) 

Then some basic algebraic manipulation of 


y — V—2 = (a+ bV—2)° 


and divisibility considerations end up showing that b | 1 and a = +b, which 
ends up implying y = +5 and « = 3. (We will not take this further; see 
Exercise 15.7.10.) | 


Where’s the problem? It turns out you can say that if a product of co- 
prime numbers is a cube, then the factors are cubes in this situation; however, 
it requires some (geometrically motivated) proof, just like with Z[7]. In his 
1765 “Vollstandige Anleitung zur Algebra”, sections 187-188 and 191, Euler 
explicitly says that this just works — in any number system with Z[,/c]. He 
solves the original Bachet equation in section 193, and solves 2° = y? +4 using 
the same technique in section 192, without realizing he had not proved this im- 
plicit assumption. (This is the same assumption he tacitly made in examining 
Fermat’s Last Theorem for the case n = 3.) 

But we shouldn’t be too hard on Euler! He was one of the first people to 
even consider some essentially random new number system obtained by adjoin- 
ing /c (for some integer c) to the integers. And as noted in Example 3.5.4, 
in 1738 he gave a correct and full proof of the observation that 8 and 9 is the 
only time a perfect square is preceded by a perfect cube, which is Mordell’s 
equation for k = —1. (See also Question 3.5.5.) 

If you are interested in more information about how to prove cases of 
Mordell’s equation, there are many good resources, including a nice one on 
Keith Conrad’s website’. But even finding a bound on the size of solutions to 
Mordell’s equation for a given k is tricky. 


e Mordell, Siegel, and Thue all had a part after World War I in showing 
there are finitely many solutions for a given k, but said nothing about 
how big x and y might be. 


7www.math.uconn. edu/~kconrad/bLurbs/gradnumthy/mordellegn1 .pdf 
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e An early bound on the size of the numbers was that 


Ia] 2 10? |20* 


which is of course ridiculously huge. 


¢ More recent conjectures are that x has absolute value less than e@|k|?*¢, 
where ¢€ is as small as you want and C’ seems to pretty close to one, 
probably less than two. 


We cannot close discussion of this topic without a final very famous result 
carrying Mordell’s name. Recall that these curves can have infinitely many 
rational points, even if they have finitely many (or zero) integer points. The 
following is a bit of a surprise, then; the rational points can still be described 
finitely. 


Theorem 15.3.6 Mordell’s Theorem. Essentially, the set of (rational) 
points on a Mordell curve is a combination of finitely many “cyclic” (recall 
Fact 14.2.7) groups (in a very specific way I will not describe), and so it can 
be described using finitely many of the rational points. 

If you like, the number of rational points might be infinite, but not too 
infinite. 


15.4 Points on Quadratic Curves 


On the other hand, finding lattice points on a quadratic curve is much more 
tractable. This is because we understand conic sections so well, after having 
worked with them for two thousand years! 


Figure 15.4.1 Integer points on x? + 2y? = 9 


In Figure 15.4.1 we see our second prototype, 7?+2y? = 9. You can see that, 
in addition to the obvious solution where y = 0, there is the (nearly as obvious, 
because the numbers are small, but still interesting) solution « = 1, y = 2. 
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In general, for our purposes an ellipse is special because there are only 
finitely many lattice points to check. So much for the computational problem 
— just get a fast computer! 

However, in this chapter I’d like to just start investigating where a general 
theory for such things might come from. After all, it gets harder to check with 
“industrial strength” ellipses, and we want theorems. This section gives two 
hints of an algebraic nature; we will take a third, more geometric hint, a bit 
further in the end of the chapter. 


15.4.1 Transforming conic sections 


Although it’s being removed from the curriculum nowadays, students in high 
school mathematics or first-year college calculus often learn how to use matrices 
to transform one conic section to another of the same type. 


Example 15.4.2 We can get from the circle r7+y? = 9 to the ellipse 7?7+2y? = 
9 by multiplying the vector (x,y) by the matrix t 1 His that would not 


stretch the x-axis, but shrinks the y axis by the appropriate amount. 


Since this approach uses matrices with non-integer coefficients, it might not 
seem promising to use matrices. However, one can also think of both conics in 
such a transformation as coming from matrices. 

Compare the following so-called quadratic forms: 


wah ae 
fe a) ({ ) (*) = 22 + 247°. 


Fermat’s question essentially asked what integers n can be represented as the 
first one; Gauss was interested in extending this to ask numbers are repre- 
sentable in by a more general expression of the form ax? + 2bry + cy”. This 
generalizes the sum of squares where a = 1 = c,b = 0, and is achieved by using 


b 
the matrix (; ) instead. It turns out that many such expressions represent 


precisely the same sets of integers (recall Section 14.3). 
The Sage reference manual’ uses our example to demonstrate. Consider 
two seemingly unrelated expressions: 


cw (1) (2) asteatena @ (12) (2) act prevtar 


By the theory of quadratic forms, Fermat’s result (recall the discussion around 
Fact 14.3.1) that a prime number congruent to 1 or 3 modulo 8 can be written 
as a sum of a square and twice a square should apply to the second expression 
as well. 

As an example, both should represent the number 11. Clearly 11 = 3?+2-1? 
works for the first one, but what about x? + 2ry + 3y?? One can try this out 
in the interact below. 


var('x,y') 
Giintenact @layout=—h va bl ics = dun i output. alp) 


8doc. sagemath. org/html/en/reference/quadratic_forms/sage/quadratic_forms/ 
binary_qf.html 
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def _(a=1,b=1,c=1,d=3, output=11): 
viewsize=ceil (math. sqrt (output) +1) 
g(x, y)=axx*2+(bt+c)*xxytdxy*2 


plotl = implicit_plot(g-output, (x,-viewsize,viewsize), 
(y,-viewsize,viewsize), plot_points = 200) 
grid_pts = [[i,j] for i in [-viewsize..viewsize] for j 


in [-viewsize..viewsize]] 
plot_grid_pts = 
points(grid_pts,rgbcolor=(0,0,0) ,pointsize=2) 
Lattice_pts = [coords for coords in grid_pts if 
(axcoords[@]*2 + (bt+c)*xcoords[@]*coords[1] + 
dxcoords[1]*2 == output) ] 
plot_lattice_pts = points(lattice_pts, rgbcolor = 
(@,®,1),pointsize=20) 
show(plot1+plot_grid_ptstplot_lattice_pts, figsize = 
[5,5], xmin = -viewsize, xmax = viewsize, ymin = 
-viewsize, ymax = viewsize, aspect_ratio=1) 
pretty_print(html("Integer_lattice_points on, 
$%Sx*2t+%sxythsy*2=%s$"%(a,bt+c,d, output) )) 


Looks like x = 2, y = 1 will do it! 
In this case, the mystery is not deep; we can go between the two expressions 
with the coordinate transformation 


x? + Qay + 3y? = (29 + y)? + Qy?. 


In general there is some very deep theory involved in deciding which integers 
can be represented by various forms, which is another place where lie the 
beginnings of algebraic number theory, just like with the Gaussian integers. 
But we'll let it rest there. 


15.4.2 More conic sections 


Let’s trace back to looking for integer points on a given curve. Assuming that 
ellipses are doable by simply counting, what is next? 

The parabola comes to mind. A straightforward parabola could look like 
ny = mx”; this can be thought of more directly as y = ax”, with a = m/n in 
lowest terms. 

Then I can just check all  € Z such that n | max*. Since gcd(m,n) = 1 
(again, lowest terms), we would just need to check that n | x? (so if n is prime, 
n | « suffices). 


2 


Example 15.4.3 If y = ma? for integer m, any x will do. That makes sense; 
integer input had better give integer output, which would be a lattice point! 


Example 15.4.4 If 2y = x7, we just look at it as requiring 2 | 2. Then any 
even x will yield a lattice point, and odd x will not. 
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Figure 15.4.5 Integer points on 2y = x? 


It is not hard to come up with simple divisibility criteria for other parabolas. 
Try the following interact to check your own hypotheses. 


@interact 
def _(m=1,n=2): 
viewsize=3n 
f(x) =C(m/n)*x*2 
plot1 = plot(f,-viewsize , viewsize) 
grid_pts = [[i,j] for i in [-viewsize..viewsize] for j 
in [0..viewsize*2*(m/n) ]] 
plot_grid_pts = 
points(grid_pts,rgbcolor=(0,0,0) , pointsize=2) 
Lattice_pts = [coords for coords in grid_pts if 
(m*xcoords[@]*2==nxcoords[1])] 
plot_lattice_pts = points(lattice_pts, rgbcolor = 
(@,®,1),pointsize=20) 
show(ploti1+plot_grid_ptst+plot_lattice_pts, figsize = 
[5,5], xmin = -viewsize, xmax = viewsize, ymin = -1, 
ymax = (m/n)*xviewsize%2) 


One might think this is all there is to say about points on the parabola. 
But before we go on, I want to point out something very interesting. 

Look at Figure 15.4.6. In both graphics we examine 2y = x? and look at 
some lines. In the first one I create a line (solid red) through two integer points 
on the conic, in the other I create the tangent line through one integer point. 
Then in both cases I translate this line so it goes through the points (0,0) and 
(—2,2) of the parabola. 


CHAPTER 15. POINTS ON CURVES 262 


Figure 15.4.6 More integer points on 2y = x? 


In both cases the dashed line intersects the parabola in a second point. 
But in these examples the new point has integer coordinates! Could this be 
coincidence? 


15.5 Making More and More and More Points 


Recall from Fact 15.1.5 that the following two strategies should give new ra- 
tional points on a conic section. We will give these strategies names. 


Algorithm 15.5.1 Getting New Rational Points. Two ways to obtain 
new rational points on a conic from rational points you already have are: 


e¢ Connect two points with a secant line, and then make a line with the same 
slope but through another (rational) point. We call this adding points. 


e Find the tangent line through a point, and then make a line with the same 
slope but through another point. We call this doubling a point. 


Fact 15.5.2 The set of rational points on a conic section is an Abelian group. 
Assuming you have a point selected as an identity element, the group opera- 
tion on two points P and Q is given by the first, “adding points”, operation 
Algorithm 15.5.1. That is, you connect P and Q by a secant line of slope m, 
and then connect the identity to a fourth point P+Q with a line of slope m. 
Adding a point P to itself uses the slope of the tangent line at P, the second, 
“doubling points”, operation in Algorithm 15.5.1. 


15.5.1 Toward integer points 


More germane to our investigation, our limited experience in the previous 
section suggests these processes may often give you integer points. This is not 
a coincidence; in general, we should try to add or double points to get (new) 
integer points. 

As we are only guaranteed rational points, this doesn’t always work. Below, 
I try this on the ellipse from the beginning of Section 15.4. 
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Figure 15.5.3 Trying to find more integer points on an ellipse 


Rotten luck. But in some circumstances, this strategy works very well 
indeed. Figure 15.5.4 gives an example of the simple family of hyperbolas 
x” — dy? = 1 where d=2. 


Figure 15.5.4 The hyperbola x? — 2y? = 1 


So let’s try the strategy of Algorithm 15.5.1. What happens when we take 
the tangent line to the curve x? — 2y? = 1 at the point (3,2), and then create 
a new line with the same slope through (1,0)? See Figure 15.5.5. 
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Figure 15.5.5 The hyperbola x? — 2y? = 1 with more points 


It intersects in a new integer point, amazing! And if we repeat the process 
with the new point, we get another one — use the interact to see. Hmm ... 


d=2 


def 


var('x,y') 
@interact 


_(x_0=3, y_@=2, Lattice=False , auto_update=False): 

E(x, y)=x*2-dxy%2 

X_1,Y_1=x_@*2+2*y_0@%2 ,2*x_Oxy_O 

plot1l = implicit_plot(g-1,(x_@-4,x_1+4) ,(x_0@-4, x_1+4), 
plot_points = 200) 

grid_pts = [[i,j] for i in [x_0-4..x_1+4] for j in 
[x_0-4..x_1+4]] 

plot_grid_pts = points(grid_pts, rgbcolor=(0,0,Q0), 
pointsize=2) 

Lattice_pts = [coords for coords in grid_pts if 
(coords[0]*2-d*coords[1]*2==1) ] 

plot_lattice_pts = points(lattice_pts, rgbcolor = 
(@,®,1),pointsize=20) 

Linel = plot((x_@/(2*y_@))*(x-x_@)+y_®@,x_0-4, x_1+4, 
color='red') 

Line2 = plot((x_@/(2*y_@))*(x-1) ,x_@-4,x_1t+4, 


color='red', lLinestyle='--') 
if lattice: 
show(plot1 + plot_grid_pts + plot_lattice_pts + 
linel + Line2, figsize = [5,5], xmin = x_0-4, 


xmax = xX_1+4, ymin = y_@-4, ymax = y_1+4, 
aspect_ratio=1) 
else: 
show(ploti+plot_lattice_ptst+linelt+line2, figsize = 
[5,5], xmin = x_@-4, xmax = x_1+4, ymin = y_0@-4, 
ymax = y_1+4, aspect_ratio=1) 
pretty_print (html ("The _new_points _are_$x_1=%s$_and. 
$y_1=%s$"%(x_1,y_1))) 


As it turns out, this is quite an old idea. Finding integer solutions to this 
hyperbola is called solving Pell’s equation, and has been studied in this form 
since the seventeenth century. But a process very similar to this was already 
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rigorously discussed by Brahmagupta centuries before that! 


Historical remark 15.5.6 Brahmagupta. Brahmagupta® is one of the 
earliest Indian mathematicians we have records from, though as was typical 
for mathematicians around the world for over a millennium, he was the head 
of an astronomical observatory. In addition to working on Pell’s equation (see 
for example Wikipedia!®), we saw earlier the Brahmagupta-Fibonacci identity, 
and he also had prescient results in approximation and geometry. 


Historical remark 15.5.7 Stigler’s Law. In the event, Pell did not have 
anything to do with these equations; it was all based on a misunderstanding. 
But names stick. In mathematics this phenomenon of not naming things after 
the actual discoverer is sometimes called Boyer’s law, more generally Stigler’s 
law of eponymy (which are themselves self-referential). 


15.5.2 A surprising application 


The particular equation x? — 2y? = 1 was studied by Greeks such as Theon 
of Smyrna (though not in this generality) to shed light on V2. Why would 
solutions to this equation help? 

Well, imagine that (2, y) fulfill the equation. Then divide and rearrange 
the original equation to get 


If you can find a solution to this equation with a big y, then = should be 


pretty close to 2, which means 2/y itself is pretty close to V2. 
Let’s see this in action. We already tried to find integer points on the curve 
in the following interact. 


var('x,y') 

@interact 

def _(viewsize=slider(10,20,1),d=2): 
f(x, y)=x*2-dxy*2 


plot1 = implicit_plot(f-1, (-viewsize,viewsize), 
(-viewsize,viewsize), plot_points = 200) 
grid_pts = [[i,j] for i in [-viewsize..viewsize] for j 


in [-viewsize..viewsize]] 

plot_grid_pts = 
points(grid_pts,rgbcolor=(0,0,0) , pointsize=2) 

Lattice_pts = [coords for coords in grid_pts if 
(coords[@]*2-dxcoords[1]*2==1) ] 

plot_lattice_pts = points(lattice_pts, rgbcolor = 
(@,®,1),pointsize=20) 

show(plot1+plot_grid_ptst+plot_lattice_pts, figsize = 
[5,5], xmin = -viewsize, xmax = viewsize, ymin = 
-viewsize, ymax = viewsize, aspect_ratio=1) 

pretty_print (html ("Points _on_the_curve_$x*2-%sy*2=1$"%d) ) 


The easy one for d = 2 was (3,2). And after all, 3 = 1.5 isn’t too far from 
/2 = 1.414. There seems to be another point if we zoom out, but that would 
be a tedious way to compute them .. 


Example 15.5.8 What if we double the point and take the tangent at (3,2)? 
(See Algorithm 15.5.1.) Then we take that slope, and make a new line through 


9mathshistory.st-andrews.ac.uk/Biographies/Brahmagupta/ 
10en. wikipedia. org/wiki/Brahmagupta#PeLL' s_equation 
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the “base” point (in this case, (1,0)). 

Then the next point we get is (17,12). (See Exercise 15.7.14.) Indeed, 
17? — 2-12? = 1 and 17/12 = 1.417, already correct to three significant digits. 
Those Greeks! 


15.6 The Algebraic Story 


15.6.1 Computing the hyperbola 


Now we can use our geometric intuition to reveal what is happening alge- 
braically here. The algebra is not hard, but a little dense; follow this proof 
closely. 


Proposition 15.6.1 Doubling integer points on the hyperbola x? — 2y? = 1 
yields more integer points. 

Proof. Algebraically, if x? — 2y? = 1, then the tangent line at any point (xo, yo) 
other than (+1,0) is given by implicit differentiation to be y’ = yo" So we 
start there. 

What is the line through (1,0) with that same slope? It’s 


nal) 


=7—(xr-1), 
yo | 


y 


of course. Let’s check where else this intersects the hyperbola, if at all. 
Start off with plugging the line into the hyperbola: 


2 
a — Qy?-1=2? 2( 2 (@ 0) 1= 
2Yo 


2 2 2 
(~ gh)at+ ()a+(-1- (58) = 9 
25 Yo 2Y5 


This can be simplified and then solved, unbelievably (via the quadratic formula 
or factoring out x — 1): 


(2y? — 23)a? + Qa2x + (—2y? — 22) = 0 


See 
— = 2r2+4y2 a2 — 2y2 


= 29 + 2y6 


Finally, do a slick substitution of the original point: 


Xo 
(2G + 24g — (25 — 2y6)) = 22040. 


y= 5, eo 1) = 

2Yo 2Yo 

To recap, given a point (xo, yo) we have achieved a new point (x3 + 2y2, 2xoyo). 

| 

Now let’s try this with actual points in Sage! I have provided both a 
numerical and a graphical interact. 


@interact 
def _(x_@=17, y_0=12): 
X_1=xX_@*2+2*y_0%2 
y_1=2*x_@xy_@ 
pretty_print (html ("Initial_point_was_$(%s ,%s)$; new. 
point is $(%s ,%s)$."%(x_®@, y_®@,x_1,y_1))) 
pretty_print (html (r"And_indeed_$%s*2-2\cdot%s*2$_equals,. 
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d=2 
WEI (C8, 17D) 
@interact 
def _(x_0=3, y_0=2, Lattice=False, auto_update=False): 
E(x, y)=x*2-dxy%2 
X_1,Y_1=xX_@*2+2*y_0%2 ,2*x_Oxy_O 
plotl = implicit_plot(g-1, (x_0-4,x_1+4), 
(x_@-4,x_1+4),plot_points = 200) 
grid_pts = [[Li,j] for i in [x_@-4..x_1+4] for j in 
[x_0-4..x_1+4]] 
plot_grid_pts = points(grid_pts, rgbcolor=(0,0,Q0), 
pointsize=2) 
Lattice_pts = [coords for coords in grid_pts if 
(coords[@]*2-dxcoords[1]*2==1) ] 
plot_lattice_pts = points(lattice_pts, rgbcolor = 
(@,®,1), pointsize=20) 
Linel = plot((x_0@/(2*y_@))*(x-x_@)+y_®@,x_@-4,x_1+4, 
color='red') 
Line2 = plot((x_@/(2*y_@))*(x-1) ,x_@-4,x_1t+4, 
color='red', lLinestyle='--') 
if lattice: 
show(plot1 + plot_grid_pts + plot_lattice_pts + 
Linel + line2, figsize = [5,5], xmin = x_Q-4, 
xmax = x_1+4, ymin = y_@-4, ymax = y_1+4, 
aspect_ratio=1) 
else: 
show(ploti+plot_lattice_pts+linelt+line2, figsize = 
[5,5], xmin = x_@-4, xmax = x_1+4, ymin = y_0@-4, 
ymax = y_1+4, aspect_ratio=1) 
pretty_print (html ("The _new_points_are_$x_1=%s$_and. 
$y_1=%s$"%(x_1,y_1))) 


Awesome! 


15.6.2 Yet more number systems 


As mentioned earlier, Brahmagupta knew how to do this. Of course, he did 
it both without our geometric interpretation (which was only made possible 
by Descartes and Fermat’s introduction of coordinate systems, though at least 
Fermat when he examined these was not thinking geometrically) and also with- 
out the benefit of symbolically representing 2, which provides this alternate 
description of what we did. 


Fact 15.6.2 If (xo, yo) is a solution to x? — 2y? =1, then so is (x1, y1) where 


(ao + V2yo)” = 21 + V2y1. 
If you were to do the algebra out here to get a formula for (1, y,) in terms of 
(20, yo), you’d get exactly the same answer as we did above (Exercise 15.7.16). 
Moreover, notice that once again we seem to have created a new number 
system, though this time we have added to the integers the square root of a 
positive, not negative number! (And yes, it turns out that finding solutions to 
this equation is related to Z[/2]---.!") 


11For more details connecting the topics of this section more directly to abstract algebra, 
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Furthermore, the “point doubling” procedure precisely corresponds to mul- 
tiplying a group element by 2. That is to say: 


[5] + [5] = 3 (mod 7) is the same type of operation as (3, 2) + (3, 2) = (17, 12). 


It turns out that there is a more general formula that corresponds to taking 
the line through two (integer) points and then creating a line with the same 
slope that goes through the original point (1,0): 


Example 15.6.3 If both (71, y1) and (x2, y2) are solutions of x? — 2y? = 1, 
then so is 
(x1@2 + 2y1yo, T1y2 + yi 2). 


If you apply this to two points opposite each other on the same branch of the 
hyperbola, such as (3,2) and (3, —2), you will get 


(BeBe Oo) 3-9) 4) = 0) 


In this sense, if we treat (1,0) as an identity element in the sense of group 
identity, then (3, —-2) may be considered the additive inverse of (3, 2). 


@interact(layout=[['x_0', 'y_@'],['x_1','y_1'], 
['auto_update']]) 
def _(x_0=3, y_0=2,x_1=17, y_1=12, auto_update=False): 
mip eG) eS Mee 
X_3, y_3=x_1*x_0+2*y_1l*y_@,x_1l*xy_O@+y_1*x_0 
pretty_print(html("Initial_points_were_$(%s ,%s)$_and_ 
$(%s,%S)$; new point is. 
SHOAS pOS sha AOD WA eI WV) 5 SS 5 ED )) 
pretty_print (html (r"And_indeed_$%s*2-2\cdot%s*2$_ 
equals_$%s$"%(x_3,y_3,x_3*%2-2*y_3%2))) 
elif y_Q==y_1: 
X_3, Y_3=x_0*%2+2*y_0*2,2*x_O@xy_O 
pretty_print(html("Initial_points_were_$(%s ,%s)$_and_ 
$(%s,%S)$; new point is. 
SiCGSis HIS) peneo (Xa Oy Vani Xan ly oll eX eS y es)» 
pretty_print (html (r"And_indeed_$%s*2-2\cdot%s*2$_ 
equals_$%s$"%(x_3,y_3,x_3*%2-2*y_3%2))) 
else: 
print ("Input correct _numbers!") 


This procedure ends up working for any n. Just change all the 2s above to 
ns. Let’s see this “by hand” for n = 3, where we solve x? — 3y? = 1 with 


3.12 =1. 


That is, I use 
a’ = x? + 3y? and y’ = 2ary 


@interact 
def _(x=2,y=1,auto_update=False): 
X, Y=X*Xt3x*yxy, xXxytyx*x 
pretty_print (html (r"$%s*2-3\cdot%s*2=%s$"%(x, y, 
5 shes 2) )) 9) 
pretty_print (html ("New point _is.$(%s ,%s)$"%(x, y))) 


see [E.2.7, Sections 5.3-5.4]; for a more geometric viewpoint, see the same author’s Numbers 
and Geometry, Chapters 4 and 8. 
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15.6.3 The general solution (any n) 


The general solution, given two points (#1, y1) and (22, y2), would be, for n > 0 
and not a perfect square, 


x = 922+ nyye and y! = x, yo + reyt. 


Even more generally, the same formula works for combining solutions of 
two different equations like the Pell. 


Fact 15.6.4 
If x2 — nyg =k and x} — ny? = 2 
then « = ro21 + nyoy1, Y = Loy1 + yor solves x? — ny? = ke. 

Proof. See Exercise 15.7.17. | 

This is particularly nice if k = € = —1, because getting a solution for that 
would then give a solution to the Pell equation! 

Brahmagupta used analogous techniques for his time (and more sophisti- 
cated things) to solve very hard ones, as did the later English mathematicians 
who answered the following challenges of Fermat. 


Question 15.6.5 Find nontrivial solutions to these equations: 
© xc? —6ly? =1. 
° 27 —109y? =1. 
Solution. Fermat knew that the smallest solution to the second one is 
x = 158070671986249, y = 15140424455100, 


which we can check below. The great mathematician André Weil [E.5.8, II. XTII 
finds that Fermat’s comment to his counterparts that the numbers 61 and 109 
were ones selected to not give too much trouble was ‘mischievously’ said; do 
you agree? 


158070671986249%2-109*15140424455100%2 


Considering that Brahmagupta says that finding the solution x = 1151,y = 
120 to the equation x? — 92y? = 1 within a year proved the person “was a 
mathematician”, we can be very thankful for computers! 


15.7 Exercises 


1. Do the algebra which we skipped in Fact 15.1.2. 
2. Do the algebra which we skipped in Example 15.1.6. 
Exercise Group. Find a parametrization (similar to Fact 15.1.2) for rational 
points on the following curves. 
3. The ellipse x? + 3y? = 4. 
4. The hyperbola x? — 2y? = 1. 


5. Finish proving (Fact 15.1.8) that 2? + y? = 15 cannot have any rational 
points, including the claim about writing x and y in terms of p,q,r. 


6. Finish the proof that 23 — 117y? = 5 has no integer solutions, looking 
modulo nine. 
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7. 


16. 


17. 


18. 


19. 


20. 


21. 


Show that the equation «* = y? — 999 has no integer solutions. (This is 
also Exercise 7.7.14.) 


Fill in some (or all) of the details of Theorem 15.3.4. 


Use Theorem 15.3.4 to come up with three Mordell curves we haven’t yet 
mentioned which have no integer solutions. 


Fill in the details of divisibility to finish Euler’s ‘proof’ of Fact 15.3.5. 


Look up the current best known bound on the number of integer points 
on a Mordell equation curve. 


Get the tangent line at (2,1) to the Dudeney curve (see Question 15.2.1) 
and find the point of intersection; why can it not give an answer to the 
original problem? 

Research Boyer’s or Stigler’s laws. What is the most egregious example 
of this, in your opinion? 

Fill in the details of Example 15.5.8, and then find an integer point with 
even bigger values than in that example. 


Show that the Pell equation with d = 1 (x? — y? = 1) has only two integer 
solutions. Generalize this to when d happens to be a perfect square. 


Show that algebraically expanding the identity in Fact 15.6.2 to solve for 
1,41 yields the formulas for x and y in the proof of Proposition 15.6.1. 


Verify that if 
x2 —nyg =k and 27 — nyj =£ 


then 


L= tor, + nyoy1, y = Loyi + yox1 Solves x? — ny? = ke. 
Explain why the previous problem reduces to the method from Sec- 
tion 15.5 where we were trying to use a tangent line to find more integer 
solutions. 

Find a non-trivial integer solution to x? — 17y? = —1, and use it to get a 
nontrivial solution to 2? — 17y? = 1. 


Recreate the geometric constructions in Section 15.5 using tangent lines 
on the hyperbola with x? — 5y? = 1, and use it to find three (positive) 
integer points on this curve with at least two digits for both x and y. Yes, 
you will have to find a non-trivial solution on your own; it’s not hard, 
there is one with single digits. 

Recall Remark 14.1.9 that the set of primitive Pythagorean triples can 
form a group, which evidently might be related to the graphs of circles 
x? + y* = c*. Find the article [E.7.38] connecting the same set, as a 
group under a different multiplication, to the hyperbolas x? — y? = a?, 
and compare this to the story in Section 15.6. Which ones seems more 
interesting, or more computable? 


Summary: Points on Curves 


There is surprising depth, but also surprisingly accessible questions, when 


investigating integer and rational points on simple nonlinear curves. 


1. We start with rational points on conics. Fact 15.1.2 gives a famous param- 


etrization of the points on the unit circle, though we also see in Fact 15.1.8 
that some conics don’t have any rational points at all. 


2. In Section 15.2 we explore a few more fun, though less crucial, cubic 


questions. 
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3. The next section begins our exploration of integer points, including facts 
such as Fact 15.3.5 about some curves with none or one. 


4. Then in Section 15.4 the conic (quadratic) cases begin. 


5. We use hyperbolas to bring in the wonderful geometric Algorithm 15.5.1 
for using existing points to get us more and more of them. 


6. Can this strategy be made algebraic? The final section does so, culminat- 
ing in the most general proposition Fact 15.6.4 of this type we present. 


The Exercises focus a lot on filling in proof details, as well as the excitement 
of exploring for actual integer points. 
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Chapter 16 


Solving Quadratic Congruences 


We have been doing a lot of work until now with squares. It is almost time 
to see one of the great theorems of numbers, which gives us great insight into 
the nature of squares in the integer world — and whose easiest proof involves 
lattice points! 

This theorem (Quadratic Reciprocity, in the next chapter) will come from 
our trying to find the solution to a useful general problem, which I like to 
think of as the last piece of translating high school algebra to the modular 
world. That is the task of solving quadratic congruences, the modular 
equivalent to the well-known quadratic equations. 

Recall that a (single-variable) quadratic expression is one of the form ax? + 
br + c, and a quadratic equation would be of the form ax? + bx +c = 0. In 
high school, students worldwide typically use the so-called quadratic formula 
to solve this: 

—b+ Vb? — dac 

2a , 
Indeed, this formula goes back in one form or another nearly four millennia 
(see the end of this article! for just one reference to an Old Babylonian problem 
of this type). 


= 


Example 16.0.1 The presence of the square root in the general formula does 
not mean every solution requires irrational numbers. Often there are solutions 
of simpler types. 

We can solve x? — 5a + 4 = 0 over the positive integers fairly easily, as 


x? — 5a +4 = (a2 —4)(2 — 1) =0 implies « = 4 or x = 1. 
The equation 4x? + 4x + 1 = 0 requires us to move to the rational numbers 
(Q), since 
1 
Ag? + 4a + 1 = (22 + 1)? = 0 implies 2x +1=0s0x=—5. 


On the other hand, sometimes we need to even go beyond the real numbers. 
The solutions of something like x? + 52 +5 = 0 will still be real, as the radical 
in the quadratic formula gives /52 — 4-1-5 = V5. But solving 2?+52+7 =0 


requires 
—5 + V5?-4-1-7 5. V¥-3 


2-1 a 
which only makes sense in the complex numbers C (recall Definition 14.1.2). 


lwww-groups.dcs.st-and.ac.uk/history/HistTopics/Babylonian_mathematics. 
html 
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The previous example may be considered in a different way. Namely, for 
different number systems like Z or R, we may ask which quadratic expres- 
sions have a solution in the system. Then if we let a quadratic congruence be 
something of the form 


ax” + ba +c =0 (mod n) 


we can ask for which groups Z,, there exists a solution! 


Example 16.0.2 Since x? —52+4 = (a—4)(2—1), we should be able to solve 
it as a congruence for any n, but we might wonder whether the other examples 
would have solutions always since they don’t have integer solutions. 

Consider 4x? + 42 + 1 = 0 (mod 5); this is equivalent to —-r? -x+1=0, 
and simple guess and check reveals that 2 = 2 is a solution! 

We leave it to the reader to check that x? +52 +5 =0 has a (very) simple 
solution if considered modulo n = 5. Perhaps most interestingly, x? +5a2+7 = 
0 (mod n) has solutions for no fewer than four different 1 < n < 20. (See 
Exercise 16.8.1.) 

This chapter will see how far we can extend all of these concepts to the 
modular world. We will begin by considering the notion of square root in that 
context. 


16.1 Square Roots 


16.1.1 Recalling existing answers 


To use the quadratic formula in full generality, one needs to know whether one 
can take square roots (for example, if they are complex, you won’t have a real 
solution). So too, our first task for modular arithmetic will be finding such 
square roots. 

Given our work in Chapter 7, e.g. Fact 7.2.2, it should be sufficient to solve 


az? =n (mod p*), 

finding square roots modulo p* where p is prime. In most cases, we can proceed 
as in Examples 7.2.5 and 7.2.6 to reduce our problem to finding solutions to 
x? = n (mod p) and then lifting. Since f’(2) = 2x2, Hensel’s Lemma is not 
strong enough to guarantee we can always do that when p = 2, but one can 
use Remark 7.2.7 to analyze this case as well. 

In this chapter we will mostly ignore this caveat about powers, and focus 
on solving for square roots modulo a prime. In fact, we have already solved 
some simple cases! We restate here a fact in the proof of Theorem 12.3.2 and 
the combination of Fact 13.3.2 and Lemma 13.3.3. 


Fact 16.1.1 The congruence x? = 1 (mod p), for p prime, always has the 
solution(s) x = +1. So if p= 2 there is one solution, and otherwise 1 has two 
square roots modulo p. 


Fact 16.1.2 The congruence x? = —1 (mod p), for p prime, does not always 
have solutions. It does precisely when p = 2 (where x = 1) and when p = 1 
(mod 4), which has the two solutions 


ra (2*)! 


where again the exclamation point here indicates the factorial. 
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16.1.2 Finding more answers 


We know the full answer (any modulus) for square roots of +1 from Fact 7.3.1. 
What about finding out when —1 has a square root for non-prime moduli? We 
can ask Sage about this: 


var('x') 
@interact 
def _(n=50): 
for i in [2..n]: 

sols = [sol[@] for sol in solve_mod([x*2==-1],i)] 

L = lten(sols) 

if l!=0: 

pretty_print (html ("$x*2=-1\\text{_(mod_}%s)$ has. 
$%s$_solutions ,.$%s$"%(i,l,sols))) 


Let’s see a few examples of how to be more systematic about this. 


Example 16.1.3 Prime power modulus. For instance, let’s go from p to 
p* by trying a bit of Example 7.2.5 from earlier. Here, f(x) = 2? +1 is what 
we want a solution for. If we are looking (mod 25), then we already know that 
(mod 5) we have x = 2 as a solution. Then a solution (mod 25) will look like 
2+ (5) (again recall Example 7.2.5). 

Remembering that f’(x) = 2a, in fact it will satisfy 


2741 
- + k(2-2) =0 (mod 5) 
which is 1+4k = 0, which has solution & = 1; hence a solution (mod 25) should 
be 2+1(5) = 7. And indeed 7? + 1 = 50 is divisible by 25 as expected! 
(Notice that 5 { f’(2) = 4, so the technical condition is granted; otherwise 
we’d have 1 = 0 to solve!) 


Example 16.1.4 Composite moduli. Suppose I want solutions to x? = —1 
(mod 14). I should immediately note that although x? = —1 (mod 2) has a 
solution, x? = —1 (mod 7) does not (it’s a prime of the form 4k + 3) so there 
will be no solutions modulo 14 either. 

But if I am looking (mod 65), since 65 = 5-13 and x? = —1 has solutions 
both (mod 5) and (mod 13), I can use the Chinese Remainder Theorem to 
combine them: 


e x =2 (mod 5) 
e x =5 (mod 13) 
We combine them thus: 
g = 2-13-(137* (mod 5))+5-5-(5~' (mod 13)) 


= 26-2425-8 = 252 =57 (mod 65) 


And that also is consistent with the computations in the Sage interact above! 


As we can see, we can usually obtain a complete answer to this and similar 
questions by using Theorem 7.2.3 and Theorem 5.3.2. 


Algorithm 16.1.5 To solve a polynomial modulo a given modulus n, the 
following information suffices, granted the technical condition for the first bullet 
that gcd(p, f’(x)) =1 for a solution x modulo a prime factor? p | n. 
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e If we can solve, for a given prime p, 


f(x) = 0 (mod p), 


we can solve 


f(x) =0 (mod p*). 


e If we can solve, for coprime integers p and q, f(x) = 
0 (mod p) and (mod q), then we can solve 


f(x) =0 (mod pq). 

Returning to square roots, it should be clear that, as far as just determining 
whether a solution exists, all we need to examine is prime moduli and powers 
of two. Everything else is taken care of by previous work. 

To avoid the complication of powers of two, and because of a similar compli- 
cation in completing the square in Algorithm 16.2.4, in the rest of this chapter 
and the next we will focus on the case of odd moduli. It can be a useful class- 
room exercise to explore both when solutions exist and the actual square roots 
modulo 2°, for which see Exercise 16.8.17; for full details see [E.2.1, Theorem 
7.14, Examples 7.13-14]. 


16.2 General Quadratic Congruences 


The equation x? +k is not the only quadratic game in town. What about other 
quadratics, such as 2? + 22 +1? It turns out that we can use something you 
are already familiar with to reduce the whole game to the following. 


Question 16.2.1 For what primes p is there a solution to x? = k (mod p)? 


Let’s confirm this with a look at general quadratic congruences. 
First let’s try computing. As an example, take x? — 2x + 3 (mod 9). The 
Sage function solve_mod works, if a little naively. 


solve_mod([x*2-2*x+3==0] ,9) 


[(5,), (6,)] 


Sage note 16.2.2 Commands of more sophistication. Notice that the 
solve_mod command is more complicated than divmod. solve_mod returns a 
list of tuples, where a tuple of length one has a comma to indicate it’s a tuple. 
(If you tried to solve a multivariate congruence you would find it returns a 
longer tuple.) 

The result shows that x? — 2x + 3 = 0 (mod 9) has two solutions. But how 
might I solve a general quadratic congruence? 


16.2.1 Completing the square solves our woes 


The key is completing the square! First let’s do an example. 


?Why not prime power factor? Because in the construction of Theorem 7.2.3 solutions 
modulo p® are also solutions modulo p. So if p divides f’(x-) for a solution ze, it will already 
divide f’(x) for some solution modulo p. 
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Example 16.2.3 Completing the square for x? — 22 +3 is done by 


fa an49= (2 2x 4 (3) ) + 3 (3) =@-pi+2, 


so solving the original congruence reduces to solving 


(a — 1)? = —2 (mod n) 


Then assuming I have a square root s of —2 (mod n), I just compute s + 1 
and I’m done! Go ahead and try this for a few different n, including of course 
n = 9, with Sage. 


solve_mod([x*2==-2],9) 


[(4,), (5,)] 


Should you not have particularly enjoyed completing the square in the past, 
here is the basic idea for modulus n. 


Algorithm 16.2.4 Completing the square modulo n. To complete the 
square for ax? + bx +c = 0, the key thing to keep in mind is that we do not 
actually divide by 2a, but instead multiply by (2a)~!. Here are the steps. 


¢ Multiply by four and a: 4a?x? + 4abxr + 4ac = 0 


e Factor the square: (2ax + b)? — b? + 4ac = 0 


e Isolate the square: (2ax + b)? = b? — 4ac 


So to solve in this way, we'll need that 2a is a unit (more or less requiring that 
n is odd), and then to find all square roots of b? — 4ac in Zn. 


Fact 16.2.5 Given that gcd(2a,n) = 1, the full solution to 
ax* + be +c =0 (mod n) 
is the same as the set of solutions to 
x = (2a)~'(s — b) (mod n), where s* = b? — 4ac (mod n). 


Note that this means s? = b? — 4ac must have a solution. 


Example 16.2.6 Let’s do all this with x? + 32 +5 = 0 (mod n). Notice 
that b? — 4ac = 9 — 20 = —11, so this equation does not have a solution over 
the integers, or indeed over the real numbers. Does it have a solution in Z,, 
for some n, though? Try some examples by hand before using the following 
interact. 
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L = [(n, solve_mod([x*2==-11],n)) for n in prime_range(3,100) ] 
for l in L: 
L1 = [m[@] for m in 1[1]] 
modulus = lL[Q] 
pretty_print (html (r"Modulo_$%s$,$x*2\equiv.-11$_has_the_ 
solutions: _%s"%(modulus ,L1))) 
vi? (kit Ve (Lis 
try: 
LS = [mod(2*1,modulus)*(-1)*(m-3) for m in L1] 
pretty_print(html(r"For_each_of_these, $x\equiv. 
@GNcdotei) fail} G3) S$ a4si ACES») 
LS = [ls*2+3*lst+5 for ts in LS] 
pretty_print (html ("And_$x*2+3x+5$_gives_for_ieach,, 
of these: %s"%(LS))) 
except ZeroDivisionError: 
pretty_print (html ("Since_2._doesn't have ian. 
inverse_modulo_$%s$, .we.can't_use,, 
this."%modulus) ) 
pretty_print(html("<bri/>")) 


In the previous example, note that we could not proceed as over the rational 
numbers by writing 


3\" 3\" 
2 — —_ — — = 
- +30+5= (2 >) +(s (3) ) 
since there is no element 3/2 € Z,; this motivates part of the multiplication 
by 4a in Algorithm 16.2.4. Also?, note that if n = ké is composite and the 


quadratic reduces to a linear congruence modulo k or £, there may be solutions 
even if gcd(2a,n) > 1; see Exercise 16.8.18. 


16.3 Quadratic Residues 


As the previous section makes clear, finding when square roots exist (mostly 
for odd modulus) is the core of finding a complete solution. The remainder of 
this chapter and most of the next will focus on resolving this question. 


16.3.1 Some definitions 


We first introduce two definitions, a little more formal in nature. 


Definition 16.3.1 Assume that a 4 0 (mod p), for p a prime. 


¢ If there is a solution of z? = a (mod p) we say that a is a quadratic 
residue of p (or a QR). 


¢ If there is not a solution of x? = a (mod p) we say that a is a quadratic 
nonresidue of p. 


Although some authors also define this notion for composite moduli (as does 
Sage, see Sage note 16.3.3), we will go with the majority and reserve these 
terms for prime moduli. © 


31 thank Zach Teitler for discussion on this point. 
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Note that this is the same thing as saying that a does or does not have a 
square root modulo p, but the focus changes to a instead of the square root 
itself. 

It is not so easy at all to determine even when something is a QR, much 
less to compute the square roots, so we will take some significant time on this. 


Remark 16.3.2 By the way, the terminology is explained by the fact (recall 
Section 4.4) that the equivalence classes [a] are called residues, so one which is 
a perfect square is justly called quadratic’. 


Sage note 16.3.3 Quadratic residues. Sage can calculate these for us, of 
course. 


quadratic_residues (17) 


[0, 1, 2, 4, 8, 9, 13, 15, 16] 


Notice that Sage counts zero as a quadratic residue (since 0? = 0 always); 
there are technical reasons not to consider it as one in our theoretical treatment, 
as will be seen soon. 

A separate function gives the smallest nonresidue, in case you need it. 


Least_quadratic_nonresidue (17) 


16.3.2 First try for new square roots 


To get more of a flavor for this, let’s explore for which p it is true that x? = 
(mod p) has a solution. Or, to put it another way, when does two have a square 
root modulo p? 
First do some by hand; for what primes up to 20 does 2 have a square root? 
Once you’ve done this, then evaluate the next Sage cell to look at more 
data. 


@interact 
def _(odd_primes_up_to=50): 
for p in prime_range(3,odd_primes_up_to): 
solutions=solve_mod([x*2==2],p) 
if len(solutions) !=0: 
pretty_print (html (r"$x*2\equiv.2\text{_(mod. 
}%s)$_ has. solutions $%s$_and._ 
$%s$"%(p, solutions[Q0][0],solutions[1][@]))) 
else: 
pretty_print (html ("No solutions _modulo_$%s$"%p) ) 


Question 16.3.4 What do you think? Do you see any patterns in when a 
square root of two exists? 


As it turns out, it is quite hard to prove any such patterns you may find 
without the benefit of powerful theoretical machinery. Which means it is hard 


4The now-standard terminology for nonresidues can cause confusion. For example, as of 
this writing the Wikipedia page for a related concept used both ‘quadratic nonresidue’ and 
‘non-quadratic residue’. But the Google Ngram Viewer (books. google. com/ngrams) suggests 
that most academic mathematicians now use ‘quadratic nonresidue’. Then again, some older 
papers (including one by Sylvester) definitely use the term, as well as at least one instance on 
the OEIS and some newer books, so perhaps there is an interesting paper on the linguistics 
of higher mathematics waiting to be written. 
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to even know whether there is a solution to a given congruence without such 
machinery! 


16.3.3 Some history 


In fact, it is even hard to conjecture patterns for harder cases unless you are 
quite clever. Euler was one of the first to do so. In a very unusual paper, he 
included nary a proof, just closely related conjectures to this question. 

We list here three links related to the paper. Note that if you read it 
carefully, you will have hints to a solution of Question 16.3.4 in the previous 
subsection; look for numbers of the form a? — 207. 


¢ Euler archive? listing for original paper 
¢ Euler archive translation® of the paper into English 


¢ Euler’s work in this paper explained by Ed Sandifer’ (focusing on the 
cases a? + nb?) 


Next, look at two tables made by the great Italian-French mathematician 
Lagrange, courtesy of the French National Library and its online repository, 
Gallica®. (The license does not allow for commercial use of these images.) 


TABLE IL. In Figure 16.3.5, Lagrange is referring 

to integers of the form t? + au”, and 
| 
| 


Formule des nombres proposés.......... : 
Formule de leurs diviseurs impairs, et premiers 4 


ee | then what form their divisors can have. 
ay ? | i That this corresponds to what we have 
| : seen is clear in that a = 1 just means 

~5 that primes can divide a sum of squares 
ane if they are themselves of the form y?+z? 
when they are of the form 4n + 1. (See 
the discussion around Theorem 13.5.5.) 
For more on these tables and their his- 


Moe wowse 


eat peer ae | tory, see the excellent book Mathemat- 
17, 23, — 7, —1 
3 


$8 Bata, 318) 88 | ical Masterpieces [E.5.7]. 


3,7, 11, 23, a7, 31, ~5, — 29 
cE 


25, 35, —3, —13, —15, ar, 


1, 37, 45, 1 H 
33, 49, 49, 53, 57, —7, —23, — 35, —51 | 
s 31, 39, 43, 47, 55, —17, —21, —37, —41 | 


11, 29, 59, ~ 19 


Figure 16.3.5 Lagrange’s Table III 
from “Recherches d’arithmétique” 


Swww.math. dartmouth. edu/~euler/pages/E164.html 
Sarxiv.org/pdf/math/0606057v1 
Teulerarchive.maa.org/hedi/HEDI- 2005-12. pdf 
8gallica.bnf.fr 
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DARITHMETIQUE. wi In Figure 16.3.6, Lagrange is referring 
TABLE IV. to divisors of integers of the form t? — 
Formule des nombres proposés...+.....+. hectic au instead. When a = 2, this should 
Formule de leurs diviseurs impairs, et premiers iia... py?-t2qys~r2?= Gan +b. . x 
_, correspond to primes for which one may 
ase OO | have a square root of two. Note that the 
H's 2 e ip b 


- — ————) formulas for the divisors are for 4an + 


1 
sr 


b, so that when a = 2, the table says 
that we will have (odd) prime divisors 
of the form 8n+1 (and only that form). 


Does this correspond with your pattern 


1 

2 

4 1 
—1 

5 

6 


| 
] 
| 


} #45, +49, 51, £53, + 57 
| 1, 19, 49, — 29 
—5, —19, — 49, 29 

17, —7, — 18, ~ 37 


: 
: 
: 
1 
wo fer] he ee ; 
eee ake eee searching in the previous subsection? 
oj oy | 4-8 -o7 19 ; 
boyg | aa | #188, seg, Say, sh a8, 205 See also Theorem 42 of Euler’s paper 
{ 1 1,9, If, 25, —5, —13 ’ 
bier | S| ats er erm and Theorem 16.7.1 where we will con- 
Lae eee aoe | firm this pattern. 
| | 3 7 7 
_ 7) 1s | 
| 17 | aa Ge ns Agia ccna dele ake | 
{ 1 1, 5, 9, 17, 25, ~3, —15, — 27, — 31 | 
i 19 | a4 i is j 
ae er aii 
| 1 1, 3, 9, 25, 97, —7, —13, — 21, — 29, — 39 
lis oa | —1 | —1,-3, —9, — 25, —27, 7, 13, 21, 29, 39 
i 4 1 1,9, 13, 25, 29, 41, —7, —u1, —15, ~ 19, — 43 
ee eee 1-9, — 13, — 25, — 29, — 41, 7, 14, 15, 19, 43 
| 6 | st 
| | os sei, £5, 7, Hg, #13, 23, 95, +33, + 35, 
| ss 
pees 
Wess 
1 


17, 7, 13, 37 


dl me 


i 
{ 
{ 
| 


Figure 16.3.6 Lagrange’s Table IV 
from “Recherches d’arithmétique” 


Historical remark 16.3.7 Joseph-Louis Lagrange. Originally from what 
is now Italy, Lagrange was Euler’s successor in Berlin in the court of Frederick 
the Great, after Euler went back to Russia under the stable (if despotic) regime 
of Catherine the Great. One interesting point to make about him is that 
Lagrange gave proofs of many of the patterns in quadratic forms (what numbers 
look like a? + b?, a? + 2b”, etc.) that Fermat and Euler talked about. 

Although he isn’t always mentioned quite as highly in the undergraduate 
literature as his contemporaries Euler or Gauss, note that we’ve already seen 
two of his theorems (7.4.1 and 8.3.12), and Euler himself anointed him as the 
best mathematician in Europe. Later he moved to France and remained quite 
influential (as well as managing to survive the Terror, which was not true for all 
academics at the time). And if you ever read any science fiction or space stuff 
that talks about stable places to orbit being called Lagrange points — that’s 
him too! 


16.4 Send in the Groups 


What made the theory of quadratic residues/square roots work out best in the 
end was a couple of key innovations of the French mathematician Adrien-Marie 
Legendre; Gauss in particular made great use of them. 
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Historical remark 16.4.1 Adrien-Marie Legendre. Legendre? was La- 
grange’s successor in Paris. Like many mathematicians of the eighteenth cen- 
tury, Legendre worked in many areas, including function theory and mathe- 
matical physics. Notably, as increased professionalization of studies of higher 
mathematics came about in post-Revolutionary French engineering studies (a 
development historian of mathematics Judith Grabiner argues led to rigoriza- 
tion of calculus), he wrote a widely used geometry textbook. 

While approaching the topic historically can be beneficial, since we have 
the advantage of having developed the basics of groups and primitive roots, 
we will be able to simplify the exposition of quadratic residues a great deal by 
(somewhat anachronistically) using these concepts. 


16.4.1 Quadratic residues form a group 


Definition 16.4.2 Consider the set of all non-zero quadratic residues modulo 

some prime p. We call this the group of quadratic residues Q,. © 
This terminology suggests that I have a proof in my pocket for the following 

theorem. 

Theorem 16.4.3 The set of (non-zero) quadratic residues Q, modulo a prime 

p really is a group, and is even a subgroup of the group of units Up. 

Proof. We will proceed by showing the group axioms hold under multiplication. 

Since we exclude zero and p is prime, Q, is a subset of U, essentially by 

definition, which will imply it is a subgroup of U, as well. 

Let’s look at the three main axioms. 


¢ It is clear that 1 € Qpy, since 1 = 1°. So there is an identity. 


¢ Next, if a and b are both in Q, (with a = s? and b =’), then ab is also 
a quadratic residue (since (st)? = s?t? = ab). 
e All that remains is to check that this set has inverses under multiplication. 
To show this last point, assume that a = s? € Q,, and consider s~! as an 
element of Up. Then 


(s~')a = (s-1)? = (s : ml = 1. 
So by definition of inverses 
("= 07, 
which means that if a € Q, then a~! € Q, as well. | 


Remark 16.4.4 For those with some additional algebraic background, it turns 
out Q, is in fact a quotient group of U, as well, but we will not delve further 
into this here. 


16.4.2 Quadratic residues connect to primitive roots 


You might be wondering how this piece of Up connects to the most important 
thing we’ve seen so far about U,. Recall that U, was cyclic, that it had a 
generator whose powers gave us all units modulo p. We called such an element 
a primitive root of p (recall Chapter 10). 


g=mod(primitive_root(19) ,19); ¢g 


®°www-history.mcs.st-and.ac.uk/Biographies/Legendre. html 
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So let’s compare the primitive root’s powers and the quadratic residues. 
Shouldn’t be too hard ... if you aren’t computing this with Sage, just try it by 
hand with an even smaller modulus, like seven or eleven. 


g=mod(primitive_root (19) ,19) 

L=[g*n for n in range(1,19)] 

print(L) 

print (quadratic_residues (19) ) 


[@, 1, 4, 5, 6, 7, 9, 11, 16, 17] 


Note the pattern of which elements of Uj9 (as powers of the primitive root) 
are quadratic residues! This exemplifies a major fact. 


Fact 16.4.5 For odd prime modulus p, the quadratic residues are precisely the 
even powers of a primitive root g. 

Proof. Certainly g2” = (g"), so the even powers of g are QRs. 

Now we need the other set inclusion. Suppose that a € @, and a = s?. Then 
first note that s and a are themselves still powers of g (by definition of g). So 
let s = g” and a= g™ for some n,m. Then we have the implication 


m 2n 


a=g"=g9" (mod p) = m= 2n (mod p— 1). 


This is only possible if m is even since p— 1 and 2n are even. | 


Example 16.4.6 This is a very strong statement, because it does not depend 
upon the primitive root! For example, if p = 11, one can verify both 2 and 
8 are primitive roots modulo eleven; then they are clearly nonresidues, and 
moreover are odd powers of each other: 


81 = 23 and 2) = 8’ (mod 11). 


This fact will turn out to be a fantastically useful theoretical way to find 
Q,. It will show up in lots of proofy settings. Here is a first example, a very 
nice result about when a composite number is a QR. 


Proposition 16.4.7 If n = ab is a factorization (not necessarily nontrivial) 

of n, then n is a QR of p precisely when either both a and b are QRs of p or 

both a and b are not QRs of p. 

Proof. Modulo p, write a = g' and b = g/ for some i,j. Then n = ab= g’*, 

and 7+ 7 is even when 7 and j have the same parity. Because of Fact 16.4.5, 

this is exactly the same thing as the conclusion of the proposition. | 
Hence if one of a,b is a QR and the other one isn’t, neither is n = ab. 


Example 16.4.8 Let’s assume that we have the pattern observed in Ques- 
tion 16.3.4 and try to decide whether 21 is a QR (mod 23). 

Our first step is to try to make 21 a product of two numbers we already 
know something about. Since 21 = —2 (mod 23), we can look at —1 and 2 
separately. Then recall that —1 is not a QR (since 23 = 3 (mod 4)) but 2 is, 
from our explorations. So we would conjecture 21 is not a QR either. 


quadratic_residues (23) 


[®@, 1, 2, 3, 4, 6, 8, 9, 12, 13, 16, 18] 


CHAPTER 16. SOLVING QUADRATIC CONGRUENCES 284 


We can use the same trick for other numbers congruent to —2 modulo p. 
For instance, I can immediately decide that —2 = 9 is a QR (mod 11) by the 
fact that neither —1 nor 2 is a QR modulo 11. 


quadratic_residues (11) 


[@, 1, 3, 4, 5, 9] 


We will soon revisit this idea in Proposition 17.1.1. 


There is yet another way to view the tension between primitive roots and 
quadratic residues. Before moving on to the next interactive graphic, try to 
answer the following question. 


Question 16.4.9 Do you see why a quadratic residue automatically can’t 
be a primitive root? (This follows from results earlier in this chapter; see 
Exercise 16.8.10.) 
Now try our familiar graphic again, this time concentrating on which rows 
correspond to primitive roots and which ones to quadratic residues. 


0 2 4 6 8 10 12 


Figure 16.4.10 Colored table of powers modulo n = 13 


The second column (labeled 1) contains all the residues, and by definition 
the quadratic residues are the colors located in the third column (labeled 2 as 
they are squares). See how that column is symmetric about the middle of the 
rows, with two of each of the QR colors appearing. Further, these are the same 
colors as the ones appearing in every other column in rows with a primitive 
root (the rows with every color represented); naturally, the order may be quite 
different. Finally, the second column’s color in each row that has every color 
(including black) never shows up in the third column (the one for squares); 
this corresponds to the fact that a primitive root can’t be a quadratic residue. 

Try it out interactively until the connection between the known facts and 
the graphical pattern seems plausible. 


import matplotlib.pyplot as plt 
from matplotlib.ticker import IndexLocator, FuncFormatter 
@interact 
def power_table_plot (p=(13, prime_range (50)[2:])): 
mycmap = plt.get_cmap('gist_earth',p-1) 
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myloc = IndexLocator(floor(p/5) ,.5) 

myform = FuncFormatter(lambda x,y: int(xt+1)) 

cbaropts = { 'ticks':myloc, 'drawedges':True, 
"boundaries ':srange(.5,p+.5,1)} 

P=matrix_plot(matrix(p-1,[mod(a,p)*b for a in range(1,p) 
for b in srange(pt+1)]),cmap=mycmap, colorbar=True, 
colorbar_options=cbaropts, ticks=[myloc,myloc], 
tick_formatter=[None ,myform]) 

show(P, figsize=6) 


These observations may not seem as interesting, but they will pay off; in the 
next section we will obtain a crucial criterion for computing quadratic residues 
by observing a similar pattern! 


16.5 Euler’s Criterion 


As it happens, Fact 16.4.5 is a terrible way to actually find quadratic residues 
for a given p in general, since it involves finding a primitive root and listing 
lots of powers. We need both theory and practice. 

There is a much easier way. First recall our observation in Theorem 12.3.2: 


+1 for all a not divisible by p. 


We visualized it as the middle column in this graphic. 


0 2 4 6 8 10 12 


Figure 16.5.1 Colored table of powers modulo n = 13 


But as so often in mathematics, solving one question leads to another; after 
all, Theorem 12.3.2 didn’t say when we got plus or minus 1, just that these are 
the only possibilities. Observe carefully above which rows start with the colors 
corresponding to squares (the column labeled 2), comparing them to whether 
the middle column is black or white. 

Don’t go on until you have tried this (interactively below, or even by hand 
with p = 7 or p= 11). It’s important to understood what is being asked before 
looking for patterns. 
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import matplotlib.pyplot as plt 
from matplotlib.ticker import IndexLocator, FuncFormatter 
@interact 
def power_table_plot (p=(13, prime_range (50) [2:])): 
mycmap = plt.get_cmap('gist_earth',p-1) 
myloc = IndexLocator(floor(p/5) ,.5) 
myform = FuncFormatter(lambda x,y: int(xt+1)) 
cbaropts = { 'ticks':myloc, 'drawedges':True, 
"boundaries':srange(.5,p+.5,1)} 
P=matrix_plot(matrix(p-1,[mod(a,p)*b for a in range(1,p) 
for b in srange(pt+1)]), cmap=mycmap , colorbar=True, 
colorbar_options=cbaropts, ticks=[myloc,myloc], 
tick_formatter=[None ,myform]) 
show(P, figsize=6) 


Hopefully you did notice a pattern. Let’s formalize it as follows. 


Theorem 16.5.2 Euler’s Criterion. I[f p is an odd prime, then for all 
integers a not a multiple of p, the sign of the following expression determines 
whether a is a QR. 


= +1 (mod p) 


We obtain +1 if a is a QR, otherwise —1. 
Proof. Let g be a primitive root of p, so that a = g’ for some i. Then if we let 
h = g®—-/2 | Fermat’s Little Theorem shows that 


h? = g?-1 =1 (mod p). 


Since g is a primitive root, h = 1 is impossible, so h = —1. But then 
qP-)/2 = (gyre = (g-1/?) =hi=(-1)'. 


This is +1 when 7 is even and —1 when 27 is odd. Finally, according to 
Fact 16.4.5, this is precisely when a is a quadratic residue and nonresidue, 
respectively. | 


Example 16.5.3 This immediately gives the result in Fact 16.1.2 that —1 has 
a square root modulo an odd prime p precisely when p = 1 (mod 4), because 
(—1)-/? — +1 precisely when (p — 1)/2 is even, or p= 1 (mod 4). That is 
much easier than our previous ad-hoc way of doing it! 


Example 16.5.4 Let’s use this to confirm, for p = 17, two of the values 
implicit in the list we obtained in Sage note 16.3.3. 
First, that list included 2 as a QR. Since (p — 1)/2 = 8, the calculation is 
fairly simple: 
2° — 167 = (—1)° = 1 (mod 17), 
as expected. 


Can we confirm that 3 should not be on the list? Using Euler’s Criterion, 
we have 


Soo = (8S a 16 Sa)" 1 (mod 17), 


which correctly shows 3 is not a quadratic residue. 
We will now greatly amplify the power of our work thus far. 
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16.6 Introducing the Legendre Symbol 


Consider the lowly notion of congruence, along with its symbol =. It is easy to 
explain; yet Gauss revolutionized number theory and made it more accessible 
to others with it. 

In Legendre’s research into questions of residues, he discovered that certain 
powers were always either +1, omitting multiples of what we would today call 
the modulus. Some of what he found was essentially Theorem 16.5.2. This 
enabled the great innovation of Legendre’s we alluded to earlier. 

What of the plus or minus 1; why is this so innovative? To quote an article 
[E.7.5] on this subject, if one has a symbol for it, it becomes 


. more than a notational convenience ... Legendre reifies this con- 
cept, and makes it into an object of independent study. 


—Steven H. Weintraub 


In our modern terms, Legendre takes advantage of the fact that a = g’ is an 
even power exactly when a is a QR, and (—1)’ = 1 precisely when 7 is even 
(and hence precisely when a is a QR). This is the so-called Legendre symbol. 
(However, he did not use the term QR, just the symbol!”.) 


Definition 16.6.1 We write (3) for the Legendre symbol. Given that p is an 
odd prime, for a coprime to p we set 


(<) = 1if ais a QR modulo p, and (<) = —1 otherwise. 
Pp Pp 


We define the Legendre symbol of a modulo p to be zero if p | a. © 
Example 16.6.2 We can now restate the main content of Fact 16.1.2: For 


odd p, we have that (3) = 1 if and only if p = 1 (mod 4). 


Example 16.6.3 We can also restate Example 16.5.4 as (4) = land (3) = 
—1. 


The command in Sage is pretty straightforward. We use it, and then demon- 
strate it via an interact. 


Legendre_symbol (-2,11) 


1 


@interact 
def _(p=(17,prime_range(5Q0))): 
for n in [q for q in quadratic_residues(p) if q != 0]: 
pretty_print (html (r"$%s$_is a QRiof $%s$_and, 
$\Left(\frac{%s}{%s}\right)=%s$"%(n, p, n,p, 
legendre_symbol(n,p)))) 


Remark 16.6.4 A brief note is in order regarding the special status of zero in 
Definition 16.3.1, especially since Sage includes zero as a QR. 

First, this recognizes the special case that only 0? = 0, while 1 = 17 = (—1) 
(and everything else) usually have two square roots modulo a prime. 


2 


7 


10Unfortunately, despite the suggestion on mathoverflow.net/questions/15447 of “a on p’ 
for pronouncing it, there does not seem to be a standard way to read this aloud. 


CHAPTER 16. SOLVING QUADRATIC CONGRUENCES 288 


A deeper reason is that this status allows us to conveniently ignore the only 
integer from 0 to p— 1 which is not in Up. In fact, the multiplicative property 


Proposition 16.4.7 ensures you can consider x +> () to be a function from 


U, to {1,—1} of the kind we call a group homomorphism. (Indeed, it 
gets us from a cyclic group of order p — 1 to a cyclic group of order 2, with 
“kernel” the cyclic subgroup of order (p — 1)/2 that we already mentioned in 
Theorem 16.4.3.) 

Here’s a final introductory experiment with Legendre symbols. What is the 
sum of all Legendre symbols for a given (odd) prime? (As usual, you can do 
this by hand for small primes if you aren’t computing.) 


@interact 
def _(p=(19, prime_range(100)[1:])): 
L = [Llegendre_symbol(a,p) for a in [@..p-1]] 
pretty_print(html(r"ALl_Legendre symbols. 
$\lLeft(\frac{a}{%s}\ right) $ .can_be_listed: "%p) ) 
print (L) 
pretty_print (html ("And _they_sum_up_to $%s$"%sum(L))) 


This is cool, and a nice example of the kind of fun one can have experi- 
menting. What do you think? Do you think we can prove it? Try doing so in 
Exercise 16.8.8. (For harder exercises of this type, see [E.4.6, Exercise 9.7].) 


16.7 Our First Full Computation 


We will now complete our investigations begun in Subsection 16.3.2 by calcu- 
lating (2) using Euler’s Criterion. (There are many proofs of the following 
fact; a nice one using only the existence of a primitive root is [E.7.16].) 


Theorem 16.7.1 When Two is a Quadratic Residue. The quadratic 
residue of two modulo an odd prime p is as follows. 


° (2) =1ifp=+l1 (mod 8) 


° (2) =-1 if p = +3 (mod 8) 
Proof. We will show this by writing (p — 1)! in two different ways below in 
Proof 16.7.1. | 


Example 16.7.2 It is easiest to approach the proof first with an example. We 
will take p = 11. 

We can write 

(11 — 1)! = 10! = 1(2)(3)(4)(5)(6)(7)(8)(9) (10) 
= (2-4-6-8-10)-(1-3-5-7-9) 
= 2°.(1-2-3-4-5)-(1-3-5-7-9). 

Notice that 1,3,5 repeat; these are all the odd numbers less than or equal to 
11-1 
<S =5. 


2 
Now we will try to create 10! again from the numbers on the right after we 


have factored out 2. In this case, the only ones repeated are 1,3,5, so we are 
almost there. 

But observe that —1, —3, —5 = 10, 8,6, which are exactly the missing pieces 
of 10!. So I will factor out —1 from those three, thus: 


TO! = 2° + (1-2-3 +a-5)+(1-3-5-7+9) 
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= 2°.(1-2-3-4-5)-(—1)8- (-1--3- —5) - (7-9) 
= 2°.(-1)?-(1-2-3-4-5)-(10-8-6)(7-9) = (—1)? - 2° - (10!) (mod 11). 


Finally, cancel 10! from the first and last element of the preceding chain of 
congruences, and we get 


feo Pet Ss a PS (= 1) = 1 God 11) 


and so 2 is not a QR of 11. 
Proof of Theorem 16.7.1. Proving the general case basically follows the pro- 
cedure in the previous example to its natural conclusion; there was nothing 
special in the above argument about p = 11. 

After writing (p—1)! and factoring out 2-)/?, the “repeated” numbers will be 
the odd numbers between 1 and (p—1)/2. Clearly the only “missing” numbers 
are even ones between (p—1)/2 and p, which are just congruent to the negatives 
of the “repeated” odd numbers, so the same argument as above with (p — 1)! 
will work. 

It remains to check when we have a QR and when we do not. 


¢ If p=3 (mod 4), like p = 11, then (p — 1)/2 is odd so there will be 


it 1 
= ijk 
2 2 4 


repeated factors, as 1, 3,5 above. 


¢ If p=1 (mod 4) (like p = 17), on the other hand, then (p — 1)/2 is even 


and there are exactly 
p-1\1_p-l1 
2/2 a 


repeated factors (in that case, 1,3,5,7). 


In either case, whether the number of repeated factors ((p + 1)/4 or (p — 1)/4, 
respectively) is even or odd determines whether 2 is a quadratic residue. 

Now we simply confirm the formula given in Theorem 16.7.1 in all four possible 
cases: 


¢ Ifp=1 (mod 4) and 2=" is even, (2) = 1. These conditions imply p = 1 
4 P 


(mod 8), so 2 is a QR when p = 1 (mod 8). 


¢ If p= 1 (mod 4) and et is odd, - = —1. These conditions imply 
p = 5 (mod 8), so 2 is not a QR when p = 5 (mod 8). 


¢ Ifp=3 (mod 4) and pth is even, (2) = 1. These conditions imply p = 7 


(mod 8), so 2 is a QR when p = 7 (mod 8). 


e If p = 3 (mod 4) and pth is odd, 5 = —1l. These conditions imply 
p =3 (mod 8), so 2 is not a QR when p = 3 (mod 8). 
a 


The following Sage cell shows off Theorem 16.7.1, incidentally confirming 
a computation in Example 16.5.4. 


@interact 
def _(p = (17,prime_range(3,100))): 
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L legendre_symbol (2,p) 

r = p%s 

pretty_print(html(r"The_prime_$%s\equivi%s\text{_ (mod. 
38) $_and_$\left(\frac{2}{%s}\right)=%s$."%(p,r,p,l))) 


In the next chapter, we will vastly expand our repertoire of Legendre sym- 
bols, and see many applications. 


16.8 Exercises 


1. Fill in all the details of Example 16.0.2 for the congruences 2? + 542+5= 
0 (mod 5) and x? + 52 +7=0 (mod n). 


2. Prove that if e > 1, then there is no solution to 
a” = —1 (mod 2°). 


Use our knowledge of squares modulo 4. 


3. For what n does —1 have a square root modulo n? (Hint: use prime 
factorization and the previous problem along with results earlier in the 
chapter.) 

4. Clearly 4 has a square root modulo 7. Find all square roots of 4 modulo 
7° without using Sage or trying all 343 possibilities. Why is this exercise 
not as challenging as it seems, and what would you do to make it harder? 


5. Solve 27+32+5 =0 (mod 15) using completion of squares and trial and 
error for square roots. 
Exercise Group. Solve the following congruences without using a computer. 
6. «2? +62 +5=0 (mod 17) 
7. 5a7+3a+1=0 (mod 17) 
8. Prove that if p is an odd prime 


F (2) <0 


9. Explore and conjecture a formula for 


So a, 


acQp 


possibly dependent upon some congruence class for p. 

10. Show that a quadratic residue can’t be a primitive root if p > 2. 

11. Show that if p is an odd prime, then there are exactly pot — d(p — 1) 
residues which are neither QRs nor primitive roots. (Hint: don’t think 
too hard — just do the obvious counting up.) 

12. Use Euler’s Criterion to find all quadratic residues of 13. 

13. Evaluate Legendre symbols for all a # 0 where p = 7, using Euler’s 
Criterion. 

14. Explore for a pattern for when —5 is a quadratic residue. Try not to use 
any fancy criteria, but just to seek a pattern based on the number. 
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15. 


16. 


17. 


18. 


Use Euler’s Criterion and the ideas of Proof 16.7.1 to prove that 3 has a 
square root modulo p if p = 1 (mod 12). (See Proposition 17.3.4 for full 


details of (2) .) 


Explore for a pattern for, given p, how many pairs of consecutive residues 
are both actually quadratic residues. Then connect this idea to the fol- 
lowing formula, which you should evaluate for the same values of p: 


eles 


a=1 


(A harder problem is to prove your evaluation works for all p.) 


Show that, given a power of two, 2°, greater than four, x? = a (mod 2°) 
either has zero or four solutions. (Remark 7.2.7 or even Exercise 7.7.15 
may be useful here.) 


Let n = pq for odd primes p,q. Create a quadratic ax? + br + c for which 
gcd(2a,n) = p > 1 (and reduces to a solvable linear congruence modulo 
p) and which is a (solvable) quadratic modulo gq, for which the quadratic 
also has solutions modulo n. (Hint: Try small p,q and a = p.) 


Summary: Solving Quadratic Congruences 


This chapter continues discussion of quadratic entities, but returns to the 
context of solving congruences. Just like in high school algebra, one can move 
from solving linear to quadratic! 


1. 


Section 16.1 continues our usual practice of review and exploration, this 
time by reminding us of many square roots modulo n we have already 
found. 


. Next, we become systematic in finding an equivalent to the quadratic 


formula, by Completing the square modulo n. 


. The next section introduces the important definition of quadratic residues 


in Definition 16.3.1, along with some examples and history. 


. It turns out that the set of (non-zero) quadratic residues for a given 


modulus is a group (Theorem 16.4.3), and we immediately use this in 
Fact 16.4.5 to characterize them in a way that we will use again and 
again. 


. We then reinterpret the middle column of Figure 16.5.1 as the incredibly 


useful Euler’s Criterion. 


. The second-to-last section gives us a symbolic way to treat quadratic 


residues, via the Legendre symbol (Definition 16.6.1). 


. Finally, we bring all of this together in computing When Two is a Qua- 


dratic Residue. 


The Exercises give a wide variety of practice, from solving full congruences to 
interesting theory and getting lists of residues. 
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Chapter 17 


Quadratic Reciprocity 


So far, we have determined at least when some quadratic congruences have 
solutions, but at the pace set thus far, most cases should seem beyond reach. 
We certainly won’t want to use Theorem 16.5.2 directly for every single one. 
It turns out that finding out when numbers have square roots (mod p) is 
not hopeless — quite the opposite is true! After raising our spirits with some 
simple but powerful observations, we will make our way to the great theorem 
that is the title of this chapter. Using it, we will derive almost effortlessly 
results regarding quadratic residues that originally took a great deal of work. 


17.1 More Legendre Symbols 


Let’s begin by calculating some more individual Legendre symbols. Now that 
we have seen a little bit of harder theory, we may appreciate some straight- 
forward techniques that can work in lucky circumstances. (Seeing that these 
techniques are limited may also motivate our theoretical work in the remainder 
of the chapter.) 

First, recall we proved the following as Proposition 16.4.7: 


Proposition 17.1.1 If n = ab is a factorization (not necessarily nontrivial) 
of n, then n is a QR of p precisely when either both a and b are QRs of p or 
both a and b are not QRs of p. In terms of Legendre symbols: 


G)-G)@) 


Example 17.1.2 Let’s try to compute (§). Here, factoring will help; 


8\ (4 2 

19/ + \19 19) ° 
Since 4 is a perfect square, its symbol is one, and by Theorem 16.7.1 we know 
that two is not a QR modulo 19. Multiplication yields 1-—1 = —1, so eight 
doesn’t have a square root there either. 


There is another useful computational fact that comes from the observation 
that 2? =a (mod n) if and only if 2? =a+n (mod n). 


ae 
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Proposition 17.1.3 
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Example 17.1.4 What is ($5)? On the one hand, 


(3) - (Oa) <6) 


but we don’t know this yet either. On the other hand, 


62\ /62+19\ (81 
19) 19 ~ \19 
Since 81 is a perfect square in any modulus, the symbol equals 1. 


We can use these ideas to calculate a lot more Legendre symbols! Here is 
some more practice. 


Example 17.1.5 Before continuing, alternately try each of these strategies 
until you either get to a perfect square or a number we already know is (or 
isn’t) a residue. (See also Exercise 17.7.3.) 


Sage note 17.1.6 Check your work. You can always check your work, if 
you wish, using Sage. 


It turns out you can resolve some theoretical questions with these tech- 
niques. 


Fact 17.1.7 There are always consecutive quadratic residues for p > 5. 
Proof. First, we know that 1,4,9 are all quadratic residues. Thus, if at least 
one of 2,5,10 was also a QR, then we could guarantee that there were always 
consecutive quadratic residues somewhere! 

As it turns out, if p = 5 this doesn’t work, because the only (nonzero) QRs are 
+1 for that prime. But if p = 7, then a = 1 and a = 9 = 2 are consecutive. 
Now suppose p > 7 is prime. Then at least one of 2,5,10 must be a QR, since 
one of these things must be true: 


e 2 could be a QR 
e 5 could be a QR 
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e If 2 and 5 both aren’t, then the computation 


eG Pe 


means 10 is a QR! 


| 
Thus we see that calculation and theory must go hand in hand; they are 
not separate. 


17.2 Another Criterion 


Now, we might want to do something more general than just try to compute 
Legendre symbols one by one. Notice that what we did in using the Euler’s 


Criterion to find (2) was to look at numbers of the form 2x and factor out 


2. So one might ask whether something like this calculation could work with 
general a and numbers like az to find a better theoretical result. 
It turns out that this is true. We are going to follow the steps of Gauss 


protege Gotthold Eisenstein here to find a way to evaluate (3) for p an odd 
prime and gcd(a,p) = 1. It will be slow, and we won’t see the payoff until we 
prove Theorem 17.4.1, but it will give us good practice in thinking about the 
numbers themselves. 


Historical remark 17.2.1 Gotthold Eisenstein. Gotthold Eisenstein! 
was yet another brilliant young mathematician who came out of nowhere but 
died young because he couldn’t find a job which could help him financially 
enough to deal with his chronic illness. His work in several areas of algebra 
and function theory is still considered forward-looking. Of particular interest 
for this text is the Eisenstein integers”, a generalization of the Gaussian integers 
(see Exercise 14.4.2). 

I say “yet another” because this is similar to the story of Niels Abel (after 
whom Abelian groups are named), and quite likely would have been the story 
of Evariste Galois if he hadn’t been killed in a duel first; unfortunately, their 
mathematics is mostly outside the bounds of this text. 


17.2.1 Laying the foundation 


First, let’s introduce a new set and look at a couple of properties. I strongly 
advise following along with a prime like p= 11 or p= 13. 


Definition 17.2.2 Fix an odd prime p. Let EF be the set of positive even 
numbers less than p. That is, 


E =({2,4,6,...,p—1}. 


Next, given a coprime to p, let the set of multiples of a by even numbers be 
denoted 
ak = {2a, 4a, 6a,...,(p— 1)a}. 


Finally, find the remainder of each element of aE modulo p, as a nonnegative 
integer. The set of all such remainders we call aE; for convenience we may 
write ae — kp =1q,e for the remainder (and quotient k). © 


lwww-groups.dcs.st-and.ac.uk/history/Biographies/Eisenstein. html 
?mathworld.wolfram.com/EisensteinInteger.html 
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The construction of this should ring bells, because just as in Theorem 16.7.1 
and Lemma 13.3.3 we could potentially factor out (p—1)/2 factors of a from a 
product of the elements of af. (Also, here and elsewhere we are not considering 
the numbers in a£ as elements of Z,, but as integers.) 


Claim 17.2.3 Consider the set of (least nonnegative) remainders modulo p of 
numbers of the form (—1)*« for x € aE. Then as sets we have 


{ Remainder of (—1)*x | z € aE} = E. 
Proof. First, we claim both sets only contain even numbers. Recall that 
everything in aF is less than p. 


e If x is even, then (—1)*z is just x, which will then be the remainder. 


e If x is odd, then (—1)*x = —a has remainder p — x, which (as the 
difference of two odds) is also even. 


It remains to show the elements of the set in question are all different. 
Suppose any two such numbers were the same; then for some even numbers e 
and e’, and quotients k and k’, we have 


(—1)9"""" (ae _ kp) = (1) ene’ _ k'p) (mod p). 


We can reduce this further by ignoring multiples of p, and even further by 
observing that gcd(a, p) = 1 so we can cancel a from the remaining congruence. 
Then 


_ / 
6.= 26. 


If e and e’ are different then e ¥ e’, so the only option would be e = —e’. This 
directly yields e + e’ = 0. But numbers in F are positive and less than p, so 
0<e+e’ < 2p. Since p is odd we also cannot have the sum of two evens 
e+e! =p, so the only remaining choice is that e = e’. a 


Example 17.2.4 For instance, with p = 11 and a= 3 we get 
E = {2,4,6,8,10} and aE = {6,1, 7, 2, 8}. 
The set in the claim is then 


{(—1)%6, (—1)"1, (—1)°7, (—1)*2, (—1)°8} = {6, 10, 4, 2, 8}. 


17.2.2 Getting the new criterion 


Now we will try to use this set to arrive at something similar to Euler’s Crite- 
rion. Our goal would be to use it (since we know it corresponds to Legendre 
symbols), but with something different and hopefully easier to compute. Still, 
we would need to arrive at a‘—/? in the end, so let’s follow some steps that 
might lead us in that direction. 

As mentioned above, the most crucial thing to notice is that the desired 
exponent (p — 1)/2 is exactly the number of elements in FE. So a first step 
would be to multiply all the elements of aE: 


II ae = q(?-/2 II e. 
eck eck 


Now let us reduce modulo p; recall the notation r¢,_ for the remainder of ae in 
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Definition 17.2.2. This gives a congruence: 
II foe=ar VP II e. 
ecH eck 


Focus temporarily just on the product of es on the right hand side. Using 
Claim 17.2.3 and factoring out all the powers of (—1), we can write 


II eS II (-1)"* "rae = (—1)Xeen "ae II Ta,e- 
ecE ecH eck 
Now substitute everything in the congruences. We obtain 
II c= qP-V/2 II e= a(?-1)/2(_])Xeex Taye II oe 
ecE ecE ecE 


Now if we cancel the product of the remainders and note that dividing and 
multiplying by powers of (—1) is the same thing, we can connect to Theo- 


rem 16.5.2: 
qge-V/2 — (—1)Xcex Tae. 


Example 17.2.5 For instance, with p = 11 and a = 3 we can write [].., ae 
in two different ways, using first simple reduction and then Example 17.2.4: 


6-12-18-24-30=6-1-7-2-8 


6-12-18- 24-30 =3°.2-4-6-8-10=39(—-1)8 8. 6.1-7-2-8, 


Checking, we see that 6+1+7+2+48 is even. So by Theorem 16.5.2 a should 
be a QR modulo p, and 11+11+3 = 25 = 5? so in this case it is easy to verify 
by hand that (4) =i 


More generally, we have the following fact. 


Fact 17.2.6 
Pp 


Proof. Use Euler’s Criterion and the above steps. | 


What have we done? We have reduced evaluating the Legendre symbol 
(and hence deciding whether things have square roots modulo p) to calculating 
the parity of a certain sum. Given that in the previous chapter we had to 
calculate fairly large powers of modular integers, this could be an important 
improvement. 


Remark 17.2.7 Transforming such computations to a simple parity (or other) 
check is very common in algebra and number theory. 


17.2.3 The final form 


Fact 17.2.6 is still somewhat unwieldy, so there is a final simplification. 
Recall that these rz. come from remainders of e € EF. Indeed, we could 
have used Division Algorithm directly in defining them: 


ae 
ae=p | + Ta,e 
Pp 


So if we add up all the remainders, we get 


\ tse= Yi ce-p = 


ecH eCE e€E 
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But we only care about the parity of this sum! So we can remove the whole 
piece with e in it, as that’s all even, and we can replace the —p by 1, since they 
are the same modulo 2. This leaves the following much simpler criterion. 


Theorem 17.2.8 Eisenstein’s Criterion for the Legendre Symbol. Let 
p anda be as throughout, and E = {2,4,6,...,p—1}; then 


(<) = (-1)Ecce1 I, 


Remark 17.2.9 The name of the criterion is long to avoid confusion with 
another famous criterion that Eisenstein discovered. (See David Cox’s excellent 
2011 Monthly article [E.7.4], which won the Lester R. Ford award, on whether 
Theodor Schénemann deserves the credit for that criterion.) 


Example 17.2.10 To continue Example 17.2.5 where p = 11 and a = 3, let’s 
compute this exponent: 


6 12 18 24 30 
=0+1+1+4+2+2=6. 
Lic] * La] + Lin] + Lin] + [ir] ser 


Once again this is even, so 3 is confirmed to be a QR modulo 11. 


Example 17.2.11 Let’s try to compute the exercise in Example 17.1.5 where 
p=17 and a=45=11. Then we need to compute this exponent: 


22| | 44] _ | 66] | | 88 eo 110 “ 132] | | 154 ae 176 
| Mey ee ae eg 17] | 17 17 
=1424+3+4+54+64+74+9+4+10= 43. 
This is odd, so 45 is not a QR modulo 17. 
This very abstruse-seeming criterion will actually be the key to proving the 


soon-to-come Theorem 17.4.1. See Laubenbacher and Pengelley’s article [E.7.8 
for an excellent exposition, which I have expanded on significantly above. 


17.3 Using Ejisenstein’s Criterion 


Let’s calculate for a bit using this criterion. It says that we can tell whether a 


number a has a square root modulo p simply by checking whether )7 ojo, ¢ ocec - 


is even or odd. So let’s apply it to evaluating (2) for odd primes p. Equiva- 


lently, we can answer this question, which we only began answering in Exer- 
cise 16.8.15. 


Question 17.3.1 When does 3 have a square root modulo p? 


If you liked some of the integer-point counting arguments earlier, you will 
like this. For the case a = 3, we care about 


S- = 
even e, 0<e<p P 


Said another way, we are adding the integer parts of 3 for y a multiple of six 
that is less than 3(p — 1). 


Example 17.3.2 Let’s try with p = 7: We have 


3] -[ 2] +B] -aeasens 


ae 


Pp 


| 
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so Theorem 17.2.8 asks for (—1)? = —1, so (2) = —1 and 3 # s? (mod 7) for 
any s. 
What about with p = 11? Calculating the exponent gives 


6 12 18 24 30 
=04+14142+4+2= 
lal +l#/+[3] Fa Fa Bea renege 


This is even, and we already saw several times that this correctly implies 3 is 
a QR. 

What will a fact like this look like in general? All we care about is the 
parity of this sum. So, we can really ignore the terms in the sum that are 0 or 
2, as they won’t change the parity! That means we are really only looking at 


[2 for 3e that are between p and 2p, since ones less than p go to zero and 


there can’t be any number bigger than 3p if we only let e go up toe =p—1. 
This means we are considering precisely even e such that p < 3e < 2p, or 
all integers y such that the multiples of 6 give 
p<by<2psoecy<F®. 
6 3 
Notice we have reduced the entire computation to finding the parity of the 
cardinality of this small set of integers. 
It should be clear that as we think of different p, the change in the set of 
y would come when p moves above or below a multiple of six. So it seems 
reasonable to look at primes of the form p = 64+r when examining this. That 
gives 
p v0) 6k+r 6k+r 


<y<i=> <y< ay ee. & Oc = 
Cog ge 6 * 3 


ae <kh+e 
_ oe a 


(This works because the cardinality of the sets will be the same if we subtract 
integers from the endpoints.) 


Claim 17.3.3 Both of the parities we are adding can be easily computed: 
e The parity of k. 
¢ The parity of the size of the set of integers y such that § <y< 3. 


rT 


The sum of these two parities should be the parity of the set between & and 
k+ 5: 

Proof. We will actually compute both parities directly. The parity of k has 
two options. 


e If k is even, then k = 2 and p= 6k+r=1204+r. 
e If not, then k = 2€+1 and p=6k+r=120+6-+4r. 


To compute the second part, we first note that for prime p, the only possible 
residues r modulo 6 are r= 1 or r= 5. 


e If r =1, we are looking for y such that é <y< 3, of which there are 
none. 


e If r=5, we are looking for y such that 3 <y < 2, of which there is one. 
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Proposition 17.3.4 Three is a quadratic residue (or not) in the following 
circumstances. 


‘: (2) =1 ifp=+l (mod 12) 


, (2) =—1 if p=+5 (mod 12) 
Proof. Combine the facts in Claim 17.3.3. We see that 


e If p= 120+ 1 we add two even numbers, so 3 is a QR. 
e If p= 120+ 5, we add an even number and 1, so 3 is not a QR. 


e Ifp=120+6+4+1=120+7, we add an odd and zero, so 3 is not a QR. 


e Ifp=1204+64+5 = 120+ 11, we add an odd and 1, which is even, so 3 
is a QR. 


8 
Try it! 


@interact 
def _(p=prime_range(5,50)): 
L = solve_mod(x*2==3,p) 
pretty_print (html (r"$%s\equivi%s\text{_ (mod. }12) $_and_ 
$\Left(\frac{3}{%s}\ right) =%s$"%(p,p%12,p, 
Legendre_symbol (3,p)))) 
if oe: 
pretty_print(html(r"And_it turns _out_$%s%*2\ equiv. 
%S$,.$%S*2\ equiv_%s$_ (mod. 
$%s$)"%(LLOILO],L00IJL01*2,L01]([0],LL0JL01*2,p))) 


Compare to Exercise 16.8.15 as well as Example 16.5.4. 


17.4 Quadratic Reciprocity 


Now, if we had to do this prime by prime, it would still be horrible. Instead, we 


a 


will end up computing all Legendre symbols (3) with a £ —1,2 by reducing 
them to (+) or (2) using techniques from Section 17.1 and the main theorem 
of the chapter. 

As we’ve already alluded more than once, it is venerable. Parts were conjec- 
tured and proved by Euler, and all of it was conjectured by Legendre in terms 
of remainders (some commentators say he proved it as well). Carl Friedrich 
Gauss provided no fewer than eight proofs over the course of his lifetime. See 
Subsection 17.6.3 for a few more comments. 


17.4.1 The theorem 


Theorem 17.4.1 Quadratic Reciprocity. If p and q are odd primes not 
equal to each other, then 


(2) (2) = (C24), 


Proof. See Section 17.6. | 
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Remark 17.4.2 Note that the exponent has fractions, not Legendre symbols! 
We can multiply them to rewrite the exponent in a way some authors prefer: 


a ({) a(<)- 


Example 17.4.3 Computing with QR. We immediately apply this to 
vastly simplify the calculations in Section 17.3. Let q=3 and p> 3. 
Let’s write the theorem out for this case. Since (3 — 1)/2 = 1, we have 


(2) () = (-1)@-Y/2, or (2) = (-1)@-/2 () 


There are two parts to this: 


e Since 1 € Qs and 2 ¢ Qs, the Legendre symbol on the right is: 


(=) = 1 if p=1 (mod 8) and (=) = —1 if p= 2 (mod 8). 


e We can also compute the power of —1: 
(—1)®-))/? — 1 if p=1 (mod 4) and (—1)°-)/? = -1 if p =3 (mod 4). 
Combine these together and we get that (2) = 1 exactly when one of these 
two cases occurs: 
¢ p=1 (mod 38) and mod (4) 
e p=3 (mod 4) and = 2 (mod 3) 


This is precisely p = 1,11 = +1 (mod 12) as in Proposition 17.3.4! 
It’s amazing that this can work so easily. Compare to all of Example 16.5.4, 
Exercise 16.8.15, and Proposition 17.3.4. 


17.4.2 Why is this theorem different from all other theo- 
rems? 


17.4.2.1 What does it mean? 


What does the term “quadratic reciprocity” even mean? 

It means that there is a reciprocating relationship? between Legendre sym- 
bols, and hence between whether there is a square root of two primes modulo 
each other. 

One way to think of this relation is to assert that Table 17.4.4 is almost 
symmetric about the (empty) diagonal — and that we have a simple formula 
for finding where it isn’t symmetric. 


3There are vast generalizations of these laws that take the reciprocation to a very deep 
level; see [E.4.23, Chapter 19] for an accessible and engaging take on quadratic reciprocity 
in this context. 
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Table 17.4.4 Quadratic reciprocity as near symmetry of table of (2) 


p\¢q| 3 5 7 11 138 #17 «+19 23 29 31 37 
3 -1 -l 1 -1 -1 21 -1 -1 #1 
5 | -l -l 21 -1 -1 21 -1 1 1 -l 
7 1 -l -l -1 -1 1 -l 1 
11 }-1 1 #1 -l -l 21 -1 -1 -1 #1 
13} 1 -1 -1 -l 1 -1 1 1 -1 -l 
17 }-1 -1 -1 -1 #1 1 -1 -1 -1 -l 
19} 1 1 -1 -1 -1 #1 -l -1 1 = -l 
23 }-1 -1 1 1 1 -1 1 1 -1 -l 
29 }-l 1 1 -l 1 -1 -1 #1 -l -l 
31 |] 1 21 -l 21 -1 -1 -1 #1 = = -l -1 
37 |} 1 -l 1 21 -1 -1 -1 = -1 = -1) = 


Try making bigger tables (represented as matrices) in the Sage cell below. 


Ls=prime_range (3,40) 

M=matrix(len(ls) ,[legendre_symbol(a,b) for a in ls for b in 
ls]) 

show(block_matrix(2,[@, matrix(1,len(ls),ls), 
matrix(len(ls),1,ls), M])) 


Remark 17.4.5 Here is another way to say it. For odd primes p and q, 


except when p = gq = 3 (mod 4). Or see Remark 17.4.2 for yet another way; 
both are often how Theorem 17.4.1 is stated in texts. 


17.4.2.2 What does it do? 


What does quadratic reciprocity do? 


It makes computation of Legendre symbols (2) very, very easy if you have 


a prime factorization of a (and all the intermediate steps). You just need to 
use the following facts we already proved, in addition to quadratic reciprocity. 


. (=)=1 <> p=1 (mod 4) 


. (=) =1 <> p=+!1 (mod 8) 


Algorithm 17.4.6 Any Legendre symbol can be computed using the following 
steps, not necessarily in this order and often multiple times: 


e Factor the top and use Proposition 16.4.7, then computing each one sep- 
arately. 


e Reduce modulo the bottom and/or use Proposition 17.1.8 to get convenient 
tops (especially perfect squares). 


e When you get to an odd prime on the top and bottom, use Theorem 17.4.1. 


e When the top is —1 or 2, use Example 16.6.2 or Theorem 16.7.1 to finish 
your computation. 
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Proof. Read the chapter up to this point, plus the proof of Theorem 17.4.1. 
| 


Example 17.4.7 Let’s calculate (22). 


e Since they are coprime factors, (22) a (387) : (se7)- 


e Since both 11 and 167 are prime and congruent to 3 modulo four, (3) . 
11) _ ( 3? 167 
(i) = (4): (a7) 


e Reducing, we get (=) .— (187) = -1. (2) 


e Finally, we use Theorem 16.7.1 and note that 11 = 3 (mod 8) to get 
1-(4) =-1--1=1 and we see that ninety-nine is a QR modulo one 
hundred sixty-seven. 


Example 17.4.8 In a classroom experience, try these. (Else, see Exer- 
cise 17.7.16.) 


(a) 
(33) 
(57) 


And we can check them, of course. 


print (legendre_symbol (83,103) ) 
print (legendre_symbol (219, 383) ) 
print (legendre_symbol (646, 877) ) 


1 
1 
-1 


We can also come up with congruence criteria like above for other primes. See 
the exercises, such as 17.7.19 and 17.7.20. 
17.4.2.3 The Jacobi symbol 


What else does quadratic reciprocity do? Indirectly, it allows us to compute 


Legendre symbols (s) without factoring a. 


Definition 17.4.9 Let n be an odd number which factors as 


€1 ,W€2 


Ck 
n=Ppy Po “Dy. 


Then the Jacobi symbol, (4), is just the product of the relevant Legendre 
symbols: 


CNC Matos 
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Amazingly, the Jacobi symbol has all the same properties* the Legendre 
symbol has — even quadratic reciprocity and the values for a = —1,2 (see 
Exercise 17.7.12). Moreover, if (£) = —1 then a is not a QR of n. (Showing 
all this essentially just uses Chinese Remainder Theorem and Hensel’s Lemma, 
but we will not go into details here.) 

The only thing not the same as for Legendre symbols is this: 


a 


Fact 17.4.10 If n is not prime, then (2) = 1 does not necessarily imply a is 
a QR of n. 
Proof. See Exercise 17.7.13. | 


Sage note 17.4.11 Names of functions may vary. In Sage, this is named 
after yet another generalization called the Kronecker symbol. 


print (kronecker_symbol (8,15)) 
print (quadratic_residues(15)) 


The goal of introducing the Jacobi symbol is not to use the definition to do 
anything. That would be pointless. 

Instead, you can use the Jacobi symbol to help calculate Legendre symbols! 
After all, they follow almost all the same rules. You’d only need to factor here 
in order to make sure you don’t have an even number in the denominator of 
the symbol. 

It turns out this leads to an algorithm which needs only about the square 
of the number of digits of p steps to evaluate a given symbol. Generically this 
is far fewer steps than one would need if one had to factor first (as far as we 
currently know). 

Some examples, like (4), would be just as fast doing it either way. But 
others would be much slower, because you’d have to factor several times. Here’s 
an example; note that 943 is not prime. 


Example 17.4.12 
943 997 
—_ | =| ——} si =1 4 
(=r) (a) since 997 (mod 4) 
54 2 27 27 . _ 
4 
=-— (F) since both are = 3 (mod 4) 


25 2 
=— (3) = —1 because 25 = 5 


And we can check this out with Sage: 


kronecker_symbol (943 , 997) 


-1 


Compare this example with having to first factor 943 and then still do the 
whole reciprocity dance. Also, this strategy is much easier to implement on 


4en.wikipedia. org/wiki/Jacobi_symbol 
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a computer for automatic evaluation. (By the way, factoring 943 = 23 - 41 is 
itself not a gimme ‘by hand’) 

Before we go on, if you haven’t tried to compute lots of things with qua- 
dratic reciprocity, don’t go on until you do. You won’t appreciate the power 
and usefulness of the proof until you’ve struggled with some ‘by hand’ It’s 
just the way these things are. 


Example 17.4.13 To put this into practice, let’s redo (S38): 


646 2 \ (323 323\ _ 
(=) = (=) (=) = (-1) (=) since 877 = 5 (mod 8) 
877 , - 
= (Fa) = (=) since 877 = 1 (mod 4) 
323 
231 
23 
4=2° 
et (Sr) =< (Sr) oe 
Bt) | 
=-(= |= 


It’s good practice to see above where the Jacobi symbol (and not just Le- 
gendre symbol) was used. We also check again with Sage: 


)- (= -) since both are = 3 (mod 4) 


(=) = —1 since both are = 3 (mod 4). 


kronecker_symbol (646, 877) 


-1 


17.5 Some Surprising Applications of QR 


What else can quadratic reciprocity do? The answer is, a lot. This section 
collates various interesting applications of QR, as well as some places where 
being able to efficiently calculate quadratic residues by its means is generally 
helpful. 


17.5.1 Factoring, briefly 


As an example, it can help us with factoring large integers n; Gauss used it. 
The process itself is a little too long to describe here, but it’s important to get 
the flavor. 

The essential idea is that if a is a QR of n, then a is a QR of any prime 
p|n. QRs often have congruence conditions associated with them, so n must 


obey all of the congruence conditions for (4) for all the p which divide it. 


This might be a lot of conditions, which narrows the field considerably. 

Then we can use a variant on the Fermat factoring method to check for 
possible a for which a prime divisor p of n definitely is or definitely is not a QR 
(again, quadratic reciprocity can help), and then one can compute Legendre/ 
Jacobi symbols of possible p | n to reduce to just having to check a very few 
bigger possible prime factors. 
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17.5.2 Primality testing 


Another application is that it can help us check primality. For instance, a 
test similar in spirit to the Miller-Rabin (probabilistic) primality test, but 
which uses Legendre/Jacobi symbols, is the Solovay-Strassen test®. (See Exer- 
cise 17.7.22.) 

A specific example where quadratic reciprocity is helpful is with the so- 
called Fermat numbers. Recall (Subsection 12.1.1) that Euler blasted the fol- 
lowing conjecture of Fermat’s out of the water by disproving it for n = 5: 


F,, = 2?” +1 is always prime for n > 0. 


But what about bigger F,,; surely they are inaccessible to the usual factoring 
techniques? 

Analogously to Mersenne numbers (Subsection 12.1.3), for which the Lucas- 
Lehmer test can check for primality (remember GIMPS’), there is a test called 
Pépin’s test which can check for primality of Fermat numbers. (Pépin did 
this work in the late 1800s.) It turns out that no bigger Fermat numbers have 
turned out to be prime, all the way through n = 31. See the Distributed Search 
for Fermat Number Divisors® or http://ww.prothsearch.com/fermat.html for 
which Fermat numbers still need more factors’, or the relevant member of the 
excellent Prime Pages®. 

Here is the test implemented naively in Sage: 


@interact 
def _(n=(1,[1..6])): 
F=2*(2*n) +1 
pretty_print (html ("The $%s$th_Fermat number is. 
$%s$"%(n,F))) 
test = mod(3,F)*((F-1)/2) 
if test == -1: 
pretty_print (html (r"Since_$3*{(%s-1)/2}\equivi%s$, uo 
this.Fermat number _is prime"%(F,test))) 
else: 
pretty_print (html (r"Since_$3*{(%s-1)/2}\equivi%s$, 
this_Fermat number isnot _prime"%(F,test))) 


You can already see from this code that it is checking Euler’s criterion 
modulo F;,, and looking for a negative answer. Why would this test primality? 
Let’s formally state and prove the criterion. 


Fact 17.5.1 Pépin’s Test. For n> 0, F, = 2?" +1 is prime exactly when 


May 


37" = —1 (mod 22" +1) 
Proof. We will try to connect this with Euler’s Criterion. Note that (Fp — 
1)/2 = 2?"—1, the power of three in the statement. 
First, let’s assume F,, is prime. Since F;, is one more than a multiple of four, 
clearly 

F,, = 1,5, or 9 (mod 12). 
Let’s examine a few cases. 


¢ If F, =1 (mod 12), then 3 | 2?” = F, — 1, which cannot be true. 


5en.wikipedia. org/wiki/SolovayOStrassen_primality_test 

Swww.fermatsearch.org/factors/faclist.php 

7 As of this writing F29 has been known to be composite for over thirty years, yet we still 
do not know any of its factors. 

8t5k.org/glossary/page.php?sort=FermatNumber 
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e If F, = 9 (mod 12), then F,, is a number greater than three which is 
divisible by three — but it’s prime, so that’s not possible. 


e So F, = 5 (mod 12). 


Since F;, is prime, that means by Proposition 17.3.4 we know 3 is not a QR mod 
F,,. (Quadratic reciprocity is implicit here, though we happened to calculate 
this before we had stated it.) Thus Theorem 16.5.2 should give that 3\4¥»—)/? = 
—1. 

For the converse, let’s assume that Euler’s Criterion gives this answer of —1 
for a = 3. Then square both sides to get 


3-1 = 1 (mod p) 
for all primes p dividing F,,. Now, what order does 3 have here? 


¢ Since F,, — 1 = 2?", that means 3 has order some power of 2 (in U,). 


¢ But 3 can’t have order 2?"~! (or less), because it isn’t the identity when 
taken to that power. 


¢ So it must have order 22”. 


The only way 3 can have that big an order is if p is at least 2” +1—= F,. So 
since p | F,,, they must be equal! | 


Remark 17.5.2 Interestingly, Mersenne numbers can sometimes also be 
shown to be composite using quadratic residues. For instance, 2? — 1 with 
exponent p = 3 (mod 4) which is itself a Germain prime must be composite. 
See [E.2.13, Theorem 7.6], and see [E.2.4, Exercises 9.1.37-40] for many more 
criteria like this. 


17.5.3 Yes, even cryptography 


Suppose we have two primes p and q that are both of the form 4n + 3. Then 
it should (probabilistically) be possible to find a number a such that 


(2) =-1-(2) wim (2) 


where the latter symbol is a Jacobi symbol (recall Definition 17.4.9). 

Then the Goldwasser-Micali cryptosystem® uses the fact that it isn’t obvi- 
ous whether a Jacobi symbol which equals one implies a is actually a quadratic 
residue to create a public-key cryptosystem. 

Now, does this really use quadratic reciprocity? It’s true that decryption 
is possible using criteria like Euler’s if you have the factorization n = pq, and 
the Legendre/Jacobi symbol would be multiplicative with or without Theo- 
rem 17.4.1. But to my mind one wouldn’t have even had the thought to create 
such a system (or even the Jacobi symbol itself) without the full theorem, so 
it seems appropriate to mention this application here. 


17.5.4 Solving equations 


There is even more! As one example, quadratic reciprocity (or at least the 
Legendre symbol, which we most easily compute using reciprocity) helps us 
solve Mordell equations. For instance, Fact 15.3.3 and similar facts implicitly 


%en.wikipedia. org/wiki/GoldwasserOMicali_cryptosystem 
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use (3). The next easiest cases use (2) and multiplicativity. But more 


advanced ones need to compute more complicated square roots. Here are two 
examples, without proof. 


¢ The equation x? = y? + 16 has no integer solutions. (Uses ($2) .) 


¢ The equation 2° = y? — 46 has no integer solutions. (Uses (=) a 
There are many others solvable with the help of knowledge of values of the 
Legendre symbol. See for example [E.4.6, Theorem 9.12] or [E.2.8, Section 
7.4C], the latter of which explicitly uses quadratic reciprocity. 


17.5.5 Artin’s conjecture 


Let’s return to the test for F,,’s primality in Fact 17.5.1. A careful look at 
the proof shows that 3 is a primitive root for F,,, if F, is prime. Thus, if we 
had infinitely many Fermat primes (and not just five of them), we’d have an 
integer which is a primitive root of infinitely many primes. 

Such would provide a proof of at least one explicit case for the following 
long-standing question. 


Conjecture 17.5.3 Artin’s Conjecture. Every nonsquare integer except 
—1 is a primitive root for infinitely many primes. 
This conjecture is interesting for several reasons. 


e Although it is mostly believed to be true, currently there are no integers 
known to be a primitive root for infinitely primes. 


e Weirder, it is known that at least one of 3, 5, or 7 is a primitive root for 
infinitely many primes, but we don’t know which one! 


e Weirdest, it has been proved that there are at most two exceptions to 
this conjecture, yet we also know of no integers which do not satisfy it! 


That is, there are at most two nonsquare integers which are not a primi- 
tive root for infinitely many primes, yet we do not have a single specific 
integer which we can prove that for. 


There is some historical connection as well. Gauss spent some time inves- 
tigating the patterns of repetitions in simple decimal expansions of fractions, 
like 3 = .333... or 2 = .285714285714.... It turns out that this is directly con- 
nected to whether 10 is a primitive root for a given prime (see Exercise 17.7.21). 
Likewise, when Euler found that Fs = 4294967297 was composite (recall Sub- 
section 12.1.1) he would have been helped along quite a bit by information 
about this conjecture, as his proof looked directly at factors of powers of 2 
(plus one) and their possible form, not powers of 3. 


@interact 
def _(n=(1,[1..6])): 
F = 24(24n)+1 
a = mod(3,F) 
if a.multiplicative_order ()==F-1: 
pretty_print(html("$3$_is ia primitive _root of. 
$F_{%s }=%S$"%(nN,F))) 
else: 
pretty_print (html ("Not prime, no primitive _root!")) 
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We can use these ideas to find another possible way to attack Artin’s Con- 
jecture. It’s not directly related to reciprocity per se, but still connects all our 
theoretical ideas of the last several sections. 


Example 17.5.4 We put this in the form of several steps. Verifying several 
facts in these steps is left to Exercise Group 17.7.8-11. 

Recall from the very end of Section 11.6 that if g and p = 2q+1 are both odd 
primes, then we call g a Germain prime. In that case, every residue of p other 
than a = —1 and a = Oisa primitive root or a QR. One way to interpret this is 
as complementing Fact 16.4.5, which characterizes even powers of a primitive 
root as being QRs; namely, for p nearly all odd powers must be primitive roots. 

Such a prime p must be of the form p = 3 (mod 4). This follows because q 
is odd so q = 2k + 1 for some integer k, yielding 


p= 2(2k+1)+1=4k+3. 


(This is how we know that —1, which is clearly not a primitive root, also isn’t 
a QR; recall Fact 16.1.2.) 

In this case, not only are all residues other than 0,—1 either a primitive 
root or a QR, but a is one of these things precisely when p—a is the other. We 
know that 

an at Ge 


are all different modulo p, and of course all of these are QRs (and so not 
primitive roots). 

Here is the key; that means that the additive inverses of perfect squares, 
p—k?, for 2<k <q, must all be primitive roots. The smallest of these, p— 4, 
must thus be a primitive root for any such (safe; recall Subsection 11.6.4) prime 
p=2q+1. 

So if there were infinitely many such Germain primes, we would also have 
an explicit example of Artin’s conjecture ... but, so far, no such luck. 

The largest currently known’? (as of this writing, discovered in early 2016) 
Germain prime, due to James Scott Brown, is 


2618163402417 - 21290000 _ 4 


which is a number with close to four hundred thousand digits. (The previous 
record had about half as many, so this is a huge advance.) 


@interact 
def _(q=(11,[r for r in prime_range(3,100) if 
is_prime(2*r+1)])): 
p = 2*q+1 
a=mod(p-4,p) 
if a.multiplicative_order()==p-1: 
pretty_print (html ("$-4$_is_.a primitive _root of. 
$%S$"%pP) ) 
else: 
pretty_print(html("Mistake!")) 


17.6 A Proof of Quadratic Reciprocity 


You are most likely now exhausted by the many applications and uses of qua- 
dratic reciprocity. Now we must prove it. 


10t5k.org/top20/page. php?sort=SophieGermain 
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Recall the statement (Theorem 17.4.1): For odd primes p and q, we have 


that ne 
(2)() =e 


That is to say, the Legendre symbols are the same unless p and q are both of 
the form 4k + 3. 

Before beginning, let’s recall the tools we will need on our jouney. First, 
p and q are odd primes in the context of this proof. Also, we will use the 
criterion of Eisenstein’s 17.2.8 used earlier in the chapter. With that in mind, 


let oo > "| 


even e, 0<e<p 


be the exponent in question, so that 


17.6.1 Re-enter geometry 


The key to our proof will be geometrically interpreting | *]. We can think of 
it as being the biggest integer less than a which means we can think of it as 
an integer height. 

The following features are present in the next graphic, which should clarify 
how we’ll think of it geometrically. Each type of object is highlighted with a 


different color. 
e The line through the origin with slope q/p (dotted blue). 


e All the grid points in (not on) the box of width p and height q (box red, 
points black). 


e Points with even x-coordinate corresponding to the highest that one can 
get while staying under the line of slope q/p (points blue). 


e The box of width BS and height 4>* (green), which we'll need in a 


2 
moment. 
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Figure 17.6.1 Representing || geometrically 


It should be clear that each blue stack has the same height as - for some 
even e. Check that for the case (p = 11,q = 7) in Figure 17.6.1 we should have 


a total of 
7-2 " 7-4 % 7-6 " 7-8 x 7210). 
11 11 11 11 11 | 


HalE@l +L] )-La- 


14+2+3+45+6=17=1 (mod 2), 


which makes sense since 7 and 11 are both congruent to 3 modulo four, so the 
Legendre symbols would be opposing. 

The core point of the overall proof is to convince ourselves of the following 
geometric claim: 


Claim 17.6.2 The number of blue points (which is R) has the same parity as 
the total number of positive points in and on the green box which are under 
the dotted line. 

Proof. See Subsection 17.6.2. | 


Along with Eisenstein, we call this second number jz. One may note that 


(p—1)/2 af 
ely 
far LP 


When I first saw this proof, it seemed pretty opaque. I highly recommend 
getting online and trying the interactive version of the graphic below to con- 
vince yourself of the plausibility of Claim 17.6.2, or at the very least that R 
and yw really are given as claimed. 


@interact 
def _(p=(11,prime_range(3,100)),q=(7, prime_range(3,100))): 
E = [2,4..p-1] 
plot4 = plot((q/p)*x,(x,0,p),linestyle='--') 
plot3 = line([[0,0],[p,0],[€p,q],[0,q],[0,0]], 
rgbcolor=(1,0,0)) 
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plot2 = line([L@,0], [(p-1)/2,0], [(Cp-1)/2,(q-1)/2], 
[@,(q-1)/2], [0,0]], color='green') 
grid_pts_1 = [[i,j] for i in [1..p-1] for j in [1..q-1]] 
grid_pts_2 = [[i,j] for i in [1..(p-1)/2] for j in 
feo HI) 720g 
plot_grid_pts = 
points(grid_pts_1,rgbcolor=(0,0,0) ,pointsize=10) 
Lattice_pts1 = [coords for coords in grid_pts_1 if 
(coords[@]*q-coords[1]*p>@ and coords[@]<p and 
coords[Q] in E)] 
if len(lattice_pts1)!=0: 
plot_lattice_pts1 = points(lattice_pts1, rgbcolor = 
(@,0,1),pointsize=20) 
else: 
plot_lattice_pts1 = Graphics () 
show(plot2+plot3+plot4 + plot_grid_pts + 
plot_lattice_pts1, xmax=p, ymax=q, ymin=0) 
forms = '$'+'+!. join([r'\left\lfloor\frac{%s\cdot._ 
%s}{%s}\right\rfloor'%(q,e,p) for e in E])+'$' 
pretty_print (html("The_ blue_dots _represent_."+forms) ) 


forms2 = '$'+!'+!. join([r'\left\lfloor\frac{%s}{%s}_ 
\right\rfloor'%(qxe,p) for e in E]) 
forms3 = '+!'.join(['%s'%(floor(q*e/p)) for e in 


E])+r'=%s\equiv%s\text{_ (mod. 
32)$'%Csum([floor(qxe/p) for e in 
E]),sum([floor(qxe/p) for e in E])%2) 
pretty_print(html("This._simplifies_to. 
"+forms2+'='+forms3) ) 


Once the geometry is out of the way, we are almost there. 


Claim 17.6.3 Suppose that we have proved Claim 17.6.2. Then we can quickly 
prove Quadratic Reciprocity. 

Proof. Essentially all we do is take the previous claim and use it for both 
Legendre symbols; then we add and get the result. Let’s see Claim 17.6.2 in 
action for each symbol. 


e First, to get (2), we can safely ignore R to just focus on the number 


(indeed, parity) of jz, the number of positive lattice points below the 
dotted line in and on the green box. 


e The same argument applies to (); we can safely ignore the exponent 


ete 


even e’, 0<e’<q 


and instead focus on the number (indeed, parity) of positive lattice points 
in and on the green box to the left of the dotted line, which we may for 
convenience call p’. 


A useful way to think about this is that the previous two steps switch the role 
of the vertical and horizontal axes. 
Now consider the total exponent of —1 we expect from 3) (2). It will be 


the sum of those two amounts p+ ps’ — which, geometrically, is the number of 
(positive, still) points in and on the green box. (There is no overlap, because 
q and p are coprime, so there are no lattice points on the dotted line until we 
get to (p,q), which is well outside the green box.) 
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How many total points is this? The green box, by design, has dimensions be 
and , so that would mean 


Ft Blt] ae ee 2) 


even e, 0<e<p even e’, 0<e’<q 


17.6.2 Proving proper parity 


So to finish the proof via Claim 17.6.2, we must show that the number of blue 
points (points under the line with even x-coordinate) has the same parity as 
the number of positive points in the green box under the line. Equivalently, 
we will show R = yu (mod 2). 

In the next graphic, there is a lot going on, all of which we will use for 
the proof (note especially the new, green, points). We will clarify each of the 
pieces below. 


Figure 17.6.4 The full picture of proof of QR 


Combined with our previous knowledge, can you check the blue and green 
dots in the small triangle represent 


7 Ea m S| , Ea - fa * rl’ 


Let’s take a closer look at the two sets of green dots. 


e One set is on top, the lattice points with even x-coordinates greater than 
po which have y-coordinate less than q which are above the dotted line. 

e The other set is similar, but on the bottom, with odd positive x-coordinates 
less than po which have positive y-coordinate and are below the line. 


You can think of the first set as filling in the even columns greater than 


Bet while the latter set fills in the triangle for odd columns less than pot (in 


CHAPTER 17. QUADRATIC RECIPROCITY 314 


both cases, strictly inside the red box of size p by q). To further understand 
this, in the interactive form of the text you may wish to try q relatively large 
compared to p to see this more clearly. Try several different values! 


@interact 
def _(p=(11,prime_range(3,100)),q=(7, prime_range(3,100))): 

2 = 2,4. joel 

plot4 plot ((q/p)*x,(x,0,p), Linestyle='--') 

plot3 = line([L0,0],[p,0],Cp,q],[0,q],[0,0]], 
rgbcolor=(1,0,0)) 

ikeie2 = Wate.) Goi) /2 1, Io) /2. (G1) 721 « 
[@,(q-1)/2], [0@,0]], color='green') 

grid_pts_1 = [[i,j] for i in [1..p-1] for j in [1..q-1]] 

grid_pts_2 = [[i,j] for i in [1..(p-1)/2] for j in 
fl... QI) 7214 

plot_grid_pts = 
points(grid_pts_1,rgbcolor=(0,0,0),pointsize=10) 

Lattice_pts1 = [coords for coords in grid_pts_1 if 
(coords[@]*q-coords[1]*p>@ and coords[@]<p and 
coords[@] in E)] 

Lattice_pts2 = [coords for coords in grid_pts_1 if 
(coords[@]*q-coords[1]*p<®@ and coords[@]>(p-1)/2 and 
coords[1]<q and coords[@] in E)] 

Lattice_pts3 = [coords for coords in grid_pts_1 if 
(coords[@]*q-coords[1]*p>@ and coords[0]<=(p-1)/2 
and coords[@] not in E)] 

if len(lattice_pts1)!=0: 
plot_lattice_pts1 = points(lattice_pts1, rgbcolor = 

(@,0,1),pointsize=20) 


else: 
plot_lattice_pts1 = 
if len(lattice_pts2) !=0: 
plot_lattice_pts2 = points(lattice_pts2, rgbcolor = 
(@,.5,0),pointsize=20) 


Graphics () 


else: 
plot_lattice_pts2 
if len(lattice_pts3)!=0: 
plot_lattice_pts3 points(lattice_pts3, rgbcolor = 
(@,.5,0),pointsize=20) 


Graphics () 


io 


else: 
plot_lattice_pts3 = Graphics () 
show(plot2+plot3+plot4 + plot_grid_ptst+plot_lattice_pts1 
+ plot_lattice_pts2 + plot_lattice_pts3, 
xmax=p, ymax=q, ymin=0) 
forms = r'$\mu='+'+!'. join([r'\left\lfloor\frac{%s\cdot. 
%s}{%s}\right\rfloor'%(q,e,p) for e in 
Ede o OHI) JAN) eS & 
pretty_print(html("The_blue_and_green_dots_in_the_small. 
triangle _represent")) 
pretty_print (html("the_sum_,"+forms) ) 


The key observation is that these two sets of green dots are symmetric 
images — they are simply rotated around the point 


(55) 

2° 2) 

This makes sense, since with p and q odd, this would change odd to even and 
vice versa. 
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So in order to say that yz has the same parity as R (which is our goal to finish 
the proof), we just have to show that either set of green points has the same 
parity as that of the set of blue points outside the green box. Again, refer to 
the interactive graphic and try it with different primes for best understanding. 


Claim 17.6.5 Either set of green points has the same parity as the set of blue 

points outside the green box. 

Proof. There are q — 1 points in each column of points outside the green box. 

In particular, there an even number of points in each such column. 

So whether the number of blue points in a given column is even or odd, it is 

guaranteed that the parity of the green points in that same column is also even 

or odd, respectively. So the parity of the green points outside the green box is 

the same as the parity of the blue points outside the green box. | 
This means the parity of the points inside the triangle (4) is the same as 

that of the blue points (R), which is what we wanted to prove! 


17.6.3 Postlude 


It’s really quite amazing how we needed to understand congruence, parity, 
some geometry, and of course the idea of a quadratic residue in the first place 
to prove this. As of right now, there is a list of well over two hundred proofs! 
of this theorem. The very shortest might be one by G. Rousseau!”, and there 
is a nice list online!? of “favorite proofs” by various mathematicians. 

So this is one proof where it is appropriate to say Q.E.D. 


17.7 Exercises 


1. Evaluate the Legendre symbols for p = 11 and a = 2,3,5 using Eisen- 
stein’s Criterion for the Legendre Symbol. 


2. Use the previous problem, your knowledge of (+) and of perfect squares 


to evaluate all other Legendre symbols (+4) for p = 11. 


3. Do any Legendre symbols in Example 17.1.5 which you didn’t already do. 


4. Make up several hard-looking Legendre symbols (#) (modulo p = 29) 


that are easy to solve by adding p or by factoring a. Then solve them. 


5. Use the multiplicative property of the Legendre symbol'* to give a con- 


gruence condition for when (2 Set, 


6. For 0 < a,b < p, prove that at least one of a,b, and ab is a quadratic 
residue of p. 

7. In Exercise 16.8.9, you explored acd, a. Now suppose p = 1 (mod 4); 
prove that the sum of the quadratic residues of p and the sum of the 
quadratic nonresidues are the same by computing both. (See [E.7.31] for 
a more complex but analogous statement for the case p = 3 (mod 4), along 
with an elementary proof thereof.) 


Exercise Group. In Example 17.5.4 there are a number of small issues 
which need proof; here, you have the opportunity to finish them off. 


llwww. rzuser.uni-heidelberg.de/~hb3/rchrono.html 
12dx.doi.org/10.1017/S1446788700034583 
13mathoverflow.net/questions/1420/whats-the-best-proof-of-quadratic-reciprocity 
M4See [E.2.15, Section 50] to see an approach using the Minkowskian methods of Subsec- 
tion 13.4.1, connecting more explicitly to other algebraic structures related to the Gaussian 
integers. 


CHAPTER 17. QUADRATIC RECIPROCITY 316 


12. 


13. 


14. 


15. 


16. 
17. 


18. 


19. 


20. 


21. 


22. 


23. 


8. Let p bea prime of the form p = 2¢+1, where q is prime (recall that q¢ 
is called a Germain prime in this case). Show that every residue from 
1 to p—2 is either a primitive root of p or a quadratic residue. (Hint: 
Use Euler’s Criterion, and ask yourself how many possible orders an 
element of U, can have.) 


9. Prove: if p= 3 (mod 4), and if a4 +1,0, then a is a QR modulo p if 
and only if p—a is not a QR. 


10. Prove that for any prime p, if 1 < i,j < § andi # j, then i? ¥ 7? 
(mod p). (Hint: factor!) 


11. Verify the previous exercise for p = 23. 


Prove that if (2) is the Jacobi symbol instead of the Legendre symbol, it 

is still true that (2) = 1 precisely when n = +1 (mod 8). (Remember, n 

has to be odd by Definition 17.4.9.) 

Verify Fact 17.4.10 by coming up with four Jacobi symbols which evaluate 

to 1, but for which you verify a is not a quadratic residue of n. (For your 
3 


first one, why not use ()?) 


Learn about the Goldwasser-Micali public key encryption method. How 
is it implemented? What mathematics from this chapter is used? 


Make up and compute some Legendre symbols that seem pretty hard by 
using the Jacobi symbol instead. 


If you didn’t do them already, do the exercises in Example 17.4.8. 


Evaluate five non-obvious Legendre symbols a) for p = 47 using quadratic 
reciprocity. 
Find congruence criteria for p for when a € Q, for a = —3, 6, and 9. (Hint: 


Don’t do any extra work — use what you know!) 


Use quadratic reciprocity to find a congruence criterion for when 5 is a 
quadratic residue for an odd prime p > 5. 


Use quadratic reciprocity to prove the surprising statement that —5 is 
a quadratic residue for exactly those primes for whom the sum of the 
ones and tens digit is odd. (Did you conjecture this when you completed 
Exercise 16.8.14? See [E.7.10] about a story behind this unusual result.) 
Use Sage to explore why repetition in the decimal expansion of is related 
to whether 10 is a primitive root modulo p. 

Explore the Solovay-Strassen primality test. Try implementing it well 
enough to check whether a number other than 221 is prime. 

Compute two nontrivial (that is, not obviously perfect square) Jacobi 


symbols for the odd composite number n = 35; then do the same for 
n = 943. 


Summary: Quadratic Reciprocity 


Here, we harness the power of the Legendre symbol to find a deep correlation 
between solutions of two seemingly unrelated congruences — a correlation that 
enables us to tell very quickly whether any quadratic congruence has a solution! 


1. 


Section 17.1 reinterprets and extends some of our work with quadratic 
residues in terms of the Legendre symbol. 


. Next, there is a long buildup to the challenging, but rewarding, power of 
Eisenstein’s Criterion for the Legendre Symbol. 
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3. We use this criterion to compute when 3 is a quadratic residue in Propo- 
sition 17.3.4. 


4. The next section has the core of the chapter. Not only do we state 
Quadratic Reciprocity, we interpret it (such as in Table 17.4.4) and show 
how to use it efficiently to compute (such as in Example 17.4.7). Finally, 
we introduce the Jacobi symbol in Definition 17.4.9. 


5. Section 17.5 gives several interesting applications. 
6. Section 17.6 has a geometric proof of the main theorem. 


The Exercises encourage not just computation of a wide variety of Legendre 
symbols using quadratic reciprocity, but filling in gaps in proofs (such as about 
Germain primes) and proving your own facts about when certain numbers are 
quadratic residues. 


CHAPTER 17. QUADRATIC RECIPROCITY 


318 


Chapter 18 


An Introduction to Functions 


The further one goes into number theory, the more one needs to think about 
the functions involved as functions, and not just as handy computational short- 
hand. 


Question 18.0.1 What properties do number-theoretic functions (such as 
o(n)) have? What can we do with them? 


Most of the remainder of the text deals with such questions. This short 
chapter introduces some of the questions we will ask through the lens of one 
function we have done a fair amount with, and then through the eyes of one 
we have examined in less detail. 

The Euler function, like many we have seen and will see, is an example of an 
arithmetic function. An arithmetic function is a function with the positive 
integers as its domain, usually going to integer, real, or complex values. 


Remark 18.0.2 We pronounce this word with the stress on the third syllable 
in number theory when used as an adjective, but (as usual) on the second 
syllable when used as a noun. 


A-rith-me-tic functions show up when studying the higher a-rith- 
me-tic. 
We’ll spend a lot of time with three types of questions regarding arithmetic 
functions. For any given function, we wish to find or examine the following. 


e We want to have as explicit formulas as possible for our functions, which 
are often defined implicitly or in terms of counting. 


e We wish to find relational formulas, either between our function and other 
functions, or especially among different values of the function itself. 


e We desire to see what the long-term or aggregate behavior of the func- 
tions is; in practice this usually involves summation of various kinds. 


In this chapter, we will start the process, but it will recur throughout the 
remainder of the text. 


18.1 Three Questions for Euler phi 


It’s easier to say useful things about some functions than others! To begin, let’s 
go back and remind ourselves of some of the nice properties of one particular 
function we did study in some detail. In the next chapter, we’ll start exploring 
some functions that we have not yet encountered. 
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That function is, naturally, the Euler ¢ function. Recall that ¢(n) gives 
the size of the set 
{k|0<k<n, gcd(k,n) = 1} 


of residues modulo n which are coprime to n. Also don’t forget we can use 
Sage to calculate it. 


euler_phi (25) 


20 


18.1.1 Formulas 


Of course, such small values can be calculated by hand. But what about larger 
ones? Surely we don’t want to have to check every number up to n just to 
compute ¢(n). 

And indeed, in Exercise 9.6.11 you should have gotten a formula. Do you 
remember it? The following Sage cell is a hint. 


print (factor (275) ) 
print (euler_phi (275) ) 
print (275*(1-1/5)*(1-1/11)) 


5“2 * 11 
200 
200 


Fact 18.1.1 If n is the product of prime powers n = ioe p;' then we have 


the formula 
2 1 
a(n) =n] (1- =) 
j=l Pi 


Proof. Do Exercise 9.6.11! | 
If you are in a classroom setting, you may want to discuss whether it seems 
likely that arbitrary arithmetic functions have formulas. 


18.1.2 Relations 


One piece of getting a formula for ¢ is the rather interesting property @ has 
(Fact 9.5.2) that if m,n are coprime then ¢(m)é(n) = ¢(mn). This is an 
important general property an arithmetic function may have. 


Definition 18.1.2 We say that f(n) is multiplicative if 
f(m)f(n) = f(mn) when m,n are coprime. 


% 

The terminology is kind of bad, because of course the function only ‘mul- 

tiplies’ for coprime integer inputs, but since relative primality is such a funda- 

mental concept this seems okay nonetheless. We can test this property in the 
following Sage cell. 


@interact 
defi Ca=2 5 b= lhpe: 
pretty_print(html(r"$\phi(%s)=%s\text{_and_ 
}\ phi (%s)=%s$"%(a, euler_phi(a), b, euler_phi(b)))) 
if gcd(a,b)==1: 
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pretty_print (html (r"And_$\phi (%s\cdot_%s)=%s\cdot. 
%4s=%s$, their product!"%(a, b, euler_phi(a), 
euler_phi(b), euler_phi(a*b)))) 

else: 

pretty_print (html (r"But_$%s$_and_$%s$_aren't. 
coprime ,.soi$\phi (%s\cdoti%s)=%s\neqi%s\cdot. 
4s$"%(a, b, a, b, euler_phi(a*xb), euler_phi(a), 
euler_phi(b)))) 


So ¢ is multiplicative. Do you think this is an unusual property to have? 
Again, in a class setting you may wish to discuss whether it seems likely 
that arithmetic functions might have some property along these lines. 


18.1.3 Summation (and limits) 


One thing that might be useful to look at in a function is its behavior in the long 
term. In calculus, we certainly talk a lot about things like asymptotes, even 
asymptotes other than horizontal and vertical ones. Unfortunately, arithmetic 
functions don’t often look that great in this way. 

For instance, let’s look at the plot of ¢. 
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Figure 18.1.3 The Euler phi function from 1 to 100 (plot (euler_phi,1,100)) 


This doesn’t look like it’s “going” anywhere. 

That said, there is some regularity; we could look at the highest or lowest 
points, at least. Certainly prime numbers p will always have the formula 
¢(p) = p—1, and that is a nice graph; the lower limit seems reasonably regular 
as well. Try to think about how one might encapsulate such observations in 
terms of limits. 

One strategy that is sometimes used to “smooth” such behavior in places 
like analyzing stock prices is trying to calculate “averages” — that is, sum it up 
and divide. We are not ready for this with ¢ (see Section 20.5). 

However, there was a different interesting property about summation of 
g(n), namely Fact 9.5.4. To recall, what was the sum of ¢(d) over the set of 
divisors d of n? 
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@interact 
def _(n=275): 
pretty_print(html("$%s$_factors as, 
$%s$"%(n, Latex(factor(n))))) 
pretty_print (html ("Its divisors are. 
$%s$"%latex(divisors(n)))) 
pretty_print(html(r"The_sum_of_$\phi$_of_ the divisors wis. 
$%s$"%sum(Leuler_phi(d) for d in divisors(n)]))) 


Ah yes, it was just that Dae ¢o(d) = n. Even if we can’t say something 
about limiting behavior yet, this kind of summation must be getting us closer! 

As a final classroom discussion point, what kind of behavior do you think 
could happen when summation of arithmetic functions is considered? What 
about limits? Could you get anything you can get in calculus, or should some 
things not be possible? 


18.2 Three Questions, Again 


Hopefully your appetite is whetted a bit by the previous section, and especially 
the discussion opportunities about what you think might be possible. 
So let’s start exploring these questions with new functions. 


Definition 18.2.1 Let r(m) be the number of (all!) ways to write n as a 
sum of (two) squares. (This was called r2(n) when first encountered! in Exer- 
cise 13.7.7.) % 


Example 18.2.2 For instance, r(25) = 12. Why? Because you can write it 
using the pairs 


(43,44), (+4,+3), (+5,0) and (0,+5). 


Remember, we count all solutions, positive or negative, and in any particular 
order possible, in determining the value of r(n). 


18.2.1 Formulas 


In Exercise 13.7.7, we saw that r(2”) = 4. But we didn’t discuss it enough to 
question whether there might be a formula that was easier to compute than 
the process of counting all possible sums! 

As an encouragement to our search for answers to our three questions, I 
will give you a (totally unmotivated!) formula. To see what it looks like, we 
use an extension of the Fundamental Theorem of Arithmetic. 


Fact 18.2.3 Write the prime factorization of n as 


n = 24% .. . pe ght bens qe 


where we write primes of the form 4k +1 as p, and primes of the form 4k +3 
as q. Then 
0 if any f; is odd 
r(n) = m ; , 
AT[;_,(e: +1) otherwise 


lAlthough we briefly considered other rz, in Example 14.2.3, and we will see another 
example in the remarkable Theorem 25.8.1, it is usually more convenient to simplify the 
notation. 
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Proof. Unfortunately, it turns out that every single proof of this is not very 
short. They all either go into some detail regarding factorization of Gaussian 
integers (recall our allusion to this in Fact 14.1.8), or they do some lengthy 
divisibility and congruence analysis. So we will skip the proof. a 

To use this, notice that the empty product (no primes of the form 4k + 1) 
is 1, just like a sum over zero elements is zero. To prove Exercise 13.7.7, we 
note that if r(2”) then all e; and f; are zero, then we are in the second case 
and we just get 4-1 for the product. 


Sage note 18.2.4 Review quiz. You can use various tools we’ve already 
seen to compute this with Sage, such as factoring and multiplication. Try it! 


18.2.2 Relations 


We just saw an impressive relation among values of ¢(n). As an example of 
it, ¢(5)¢(3) = (15), since the inputs are coprime. Similarly, there are some 
relations with multiplying for r, though it certainly isn’t multiplicative. 


Example 18.2.5 Indeed, now that we have a formula, we can compute this. 


e For instance, 
r(3)r(5) = r(15) 


because both sides are zero! 
e For the same reason, 7r(8)r(7) = r(56). 


e On the other hand, 


r(25)r(13) = 12-8 = 96 ¢ 24 = (325) 


e Similarly, r(25)r(4) = 12-4 = 48 £12 = r(100). 


In these examples, the inputs are relatively prime but it doesn’t multiply. What 
might still be true? See Exercise Group 18.3.1-2. 


Sage note 18.2.6 Explore here. Feel free to explore here! 


18.2.3 Limits (and summation) 


In Subsection 18.1.3 we saw that (for ¢) even though we couldn’t yet address 
long term behavior, we could at least see some patterns, and could say some- 
thing about summing values. In this subsection, we will try to directly address 
long-term, average behavior for r(n). 

To be precise, we will talk about limits with functions. Yes, limits in number 
theory! 

Observe the following graphic. It has as its basic content the circle with 
radius \/n and blue lattice points representing all pairs (x,y) such that x? + 
y? <n. There is a little box of area one around each such lattice point. 
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Figure 18.2.7 Plotting sums of squares up to five 


As you might expect, the boxes roughly cover the circle, but certainly not 
exactly. So what does this have to do with r(n)? 

Each unit box around each lattice point can be thought of as standing in 
for a representation (as a sum of squares) of a given integer less than or equal 
to n. Adding up all the areas would thus give a number, as a summation: 


n 
So r(k). 
k=0 
So the area of the boxes can give us information about r. 

Here, there are 21 boxes with a circle of radius /5 ~ 2.24, giving a ratio of 
area of boxes to the square of the radius about 4.2. Try it interactively below. 


@interact 

def _(n=(5,[1..100])): 
viewsize=ceil(math.sqrt(n))+2 
a=(math.sqrt(n)+1/math.sqrt(2))%*2 
b=(math.sqrt(n)-1/math.sqrt(2))%*2 
BO.) S RAzwye2 
P=Graphics() 
P += implicit_plot(g-n, (-viewsize,viewsize), 


(-viewsize,viewsize), plot_points = 200) 

P += implicit_plot(g-a, (-viewsize,viewsize), 
(-viewsize,viewsize), Linestyle='--',plot_points = 
200) 

P += implicit_plot(g-b, (-viewsize,viewsize), 
(-viewsize,viewsize), lLinestyle='--',plot_points = 
300) 

grid_pts = [[i,j] for i in [-viewsize..viewsize] for j 


in [-viewsize..viewsize]] 
P += points(grid_pts,rgbcolor=(0,0,0) ,pointsize=2) 
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Lattice_pts = [coords for coords in grid_pts if 
(coords[1]*2+coords[@]*2<=n) ] 

P += points(lattice_pts, rgbcolor = (0,0,1),pointsize=20) 

squares=[line(L[Lk-1/2,1l-1/2], 
Cetly2 L172), Likely 2 bar 7A 
[k-1/2,+1/2],[k-1/2,1l-1/2]], rgbcoltor=(1,0,0)) for 
[k,l] in lattice_pts] 

for object in squares: 
P += object 

show(P, figsize = [5,5], xmin = -viewsize, xmax = 
viewsize, ymin = -viewsize, ymax = viewsize, 
aspect_ratio=1) 

pretty_print (html ("There_are_$%s$_boxes with _aicircle of. 
radius.$%s$"%(len( squares) ,math.sqrt(n)))) 

pretty_print(html("The_ratiowof_the_area_of boxes _to_the_ 
square_of_the radius is. 
$\\approx%s$"%(len(squares)/(math.sqrt(n)%*2)))) 


Fact 18.2.8 Observe that the boxes neither cover nor are covered by the circle 
in question. However, we can say two things about them. 


e These boxes will entirely cover a disk of radius ,/n minus half the diagonal 
length of the boxes, namely wet which is the inner circle above. 


e Likewise, they are completely contained in a disk of radius \/n plus half 
the diagonal length of the boxes. 
Proof. Geometry. a 
Let’s use this fact to create a double inequality in terms of the area covered 
by two circles and the squares: 


«(ve f) <pmnsa(vor ds) 


k=0 


If we divide by n and simplify a bit, then factor, we obtain two more: 


—-V2n+1/2 1< J2n+1/2 
- ea 
n 
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We’re almost at something interesting. 


e First, the limit as n goes to co of the lower and upper bounds with each 
of these inequalities exists. In fact, the limit of the bounds in both cases 
is 7. 
e Then, the beloved squeeze theorem from calculus implies that 
n 


lim us S- r(k) =. 


noo 1 
k=0 


¢ Finally, note that r(0) = 1, so its presence or absence will not affect the 
average in the limit at all. 


We can interpret this line of thought as proving and saying: 
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Fact 18.2.9 The average number of representations of a positive integer as a 
sum of squares is 7. 


WHAT?! 


But it’s true. And there’s more to come. 


18.3 Exercises 


Exercise Group. We see in Subsection 18.2.2 that r is not multiplicative. 
But could some related properties still be true? 


1. Look at the cases where zero is involved. State the broadest possible 
multiplicativity result you can for this case. 


2. Look at the second two examples in Subsection 18.2.2. There seems 
to be a specific sort of relationship in the precise way in which these 
examples are not multiplicative. What is that relationship? Can you 
prove it? (Hint: first compare the results, only then the individual 
inputs.) 


3. For a fixed p(x), let Z,2)(n) be the number of solutions of the polyno- 
mial congruence p(z) = 0 (mod n). Use facts from earlier in the text to 
show that this function is multiplicative. Connect this to the question of 
whether —1 € Qn. 


4. Let the function g be given by 


0 nis even 
gn) =4 1 n=1 (mod 4). 
—1 n=83 (mod 4) 


Show that the function g(n) is multiplicative. 


5. Show that the function D(n) = (—1)"~! is multiplicative. (See also Exer- 
cise 23.5.11 and Exercise Group 24.7.9-10.) 


6. Compute r(n) for 0 <n < 10 and compare the sum to 107. 
7. Compute r(n) for n = 100, 300, and 900. Can you write down all the 
actual sums of squares for these? 
Summary: An Introduction to Functions 
This short chapter introduces us to arithmetic functions, and raises some 
interesting questions we can ask about them. 
1. In Section 18.1 we review the formula and relations for the familiar Euler 
@ function, while also asking it where it is going. 
2. In Section 18.2 we ask the same questions of a new function, which cul- 


minates in the surprising Fact 18.2.9. 


The Exercises just give a little chance to think about functions, in preparation 
for the next several chapters. 


Chapter 19 


Counting and Summing Di- 
visors 


Among all the possible arithmetic functions one could discuss, there is one 
family which is both truly ancient and part of cutting-edge research. We’ll let 
ourselves be inspired by the summations in the previous chapter, by summing 
the simplest functions of all and seeing what we get. 


19.1 Exploring a New Sequence of Functions 


Definition 19.1.1 For n > 0, let o,(n) be defined as the sum of the kth power 
of the (positive) divisors of n, thus: 


ox(n) = Soak. 
d\n 


% 
Before doing any computing, think about what special information about 
a number o; and 09 might encode. 


Remark 19.1.2 Incidentally, very (very) often one will see op(n) written as 
T(n), sometimes also as d(n). Usually o1(n) is written simply o(n), though 
Euler apparently used fn in his writings (can you think why?). 

Hopefully, you realized o1 is adding all the divisors of n (including n itself), 
and that oo is the number of (positive) divisors of n. 

Now, get ready to explore! Try to figure out as much as you can about 
these functions. If you’re in a group in a class, you can certainly save time 
by dividing up the initial computations among yourselves, then sharing that 
information so you have a bigger data set to look at. 


Question 19.1.3 Can you find some or all of the following for these functions? 
e A formula, at least for some input types. 


¢ See if at least a limited form of multiplicativity (recall Definition 18.1.2) 
holds. 


You might also want to look at questions like these. 


e Can two different n yield the same o; (for a given k)? If so, when — or 
when not? Can they be consecutive? 
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e Is it possible to say anything about when one of these functions yields 
even results — or ones divisible by three, four, ... ? 


e Clearly the size of these functions somehow is related to the size of n — 
for instance, it is obvious that o9(n) = 7(n) can’t possibly be bigger than 
n itself! So how big can these functions get, relative to n? How small? 


e Can anything be said about congruence values of these functions? (This 
is a little harder.) 


If you come up with a new idea, why not challenge someone else to prove 
it? See Exercise Group 19.6.2—4 for past examples. 


19.2 Conjectures and Proofs 


Remark 19.2.1 Don’t read this section until you have tried some of the ex- 
ploration in the previous section! 

In the last section we defined some new functions, and asked some questions 
about them. You can try them by hand, or use computation to explore them 
further. 


Sage note 19.2.2 Syntax for sigma. Here is the syntax for doing this in 
Sage. However, for this function it is better to try it out by hand first! 


sigma(12,1),sigma(12,0) 


(28, 6) 


If you do not put the second argument in, Sage just computes 0; = o by 
default. 


sigma(12) 


28 


What were some of your conjectures? It is quite likely that you (or others, 
if in a class setting) discovered some of these: 


¢ o1(p) =p+1if pis prime. 
¢ oo(p’) =e +1 if p® is a prime power. 
e o; is in fact multiplicative for 7 = 0,1. 


If you dug a little deeper, or had a little more time to spend, your conjectures 
may have also included some like these: 


© o1(p°) =1+pt+p?4+---+p* for p* a prime power. 
« o,(2°) = 2°" — 1, 


¢ o0(n) is odd precisely if n is a perfect square. 


Let’s prove the most important of these things, as well as mention a few 
other useful formulas. 


19.2.1 Prime powers 


Again, usually one will have discovered various formulas that are special cases 
of the following, among others. It’s surprisingly easy to find the patterns! 
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Fact 19.2.3 If p° is a perfect prime power, then 


oop") =e +1 and oi(p*) = 1+ p tp ++ +p ==. 
Proof. There isn’t much to prove here, once discovered. Both formulas come 
from the same fundamental observation. 


e All possible divisors of a prime power must have only that prime as 
divisors, by the Fundamental Theorem of Arithmetic. So, these divisors 
are just other (smaller) powers of that prime. 


e There are exactly e+ 1 of these divisors, and these divisors are the ones 
summed up in the o; formula. 


The fraction formula for 0; is just the usual geometric summation formula 
familiar from precalculus, or perhaps calculus. a 


19.2.2 Multiplicativity 


It’s a bit harder to prove the following. See Definition 18.1.2 to remind yourself 
of the definition of multiplicative. 


Fact 19.2.4 For any i, o;(n) is multiplicative. That is, 
ai(mn) = o;(m)o;(n) when gcd(m,n) = 1. 
This automatically leads to many facts, such as this one. 


Theorem 19.2.5 If we factor n > 0 as 
n= py Py Dyes 


then we have formulas 


oo(n) = Il (e; +1) and o1(n) = Il (a) 


i=l =i Pi 1 


We will not prove this fact directly! It is possible, and might make a good 
challenge exercise. But it is not efficient. 
Instead, we will prove below a theorem that exemplifies a general principle. 


Principle 19.2.6 In the long run, it is better to prove general results for sums 
of arithmetic functions than to do each one by itself. 


Otherwise we do an endless line of proofs like the ones we did for ¢ (recall 
Fact 9.5.2), but for every arithmetic function. 


19.2.3 A very powerful lemma 


Let )/qjn denote the sum over all positive divisors (including 1 and n) of n. 
Then we have the following result, the proof of which will be easier than the 
corresponding proof for Euler’s function. 


Theorem 19.2.7 If g is multiplicative and f(n) is defined as 


F(n) = S9(4) 


d\n 


then f is also multiplicative. 
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Proof. We follow here [E.2.1]. Let m and n be coprime; we are interested in 
f(mn). 

Basically, this all boils down to asking what the divisors of mn look like. Any 
divisor of mn must be the product of some divisor a of m and some divisor b 
of n. 

The previous observation is just about multiplication and divisibility, not even 
coprimeness. But that guarantees that a and 6 are coprime as well, given that 
m and n are. So each divisor d | mn gives us a (unique) pair of (coprime) 
divisors a and b of m and n. 

Instead of summing over all divisors of mn, we can instead sum over each 
divisor of n for each divisor of m. In symbols, 


f(mn) = 32 (ab). 
alm b|n 


Now we can use all the facts we have at hand (coprimeness, multiplicativity, 
etc.) to finish it off. 


f(mn) = $2 $2 g(ab) = S25 g(a)9(0) 


alm bln alm bln 


=(o@]} [oa] = Foyer). 
bln 


a|m 


Corollary 19.2.8 Since g(n) =n‘ is clearly multiplicative, it is true that 
69) = Sod = a4(n) 
d\n d\n 


is also multiplicative. 

The special cases i = 0 and i = 1 of the corollary confirm that o9 = 7 
and o; = o are indeed multiplicative. Since it will be convenient later (see 
Definition 23.3.1 and following), we give separate names to these two special 
cases of n'. 


Definition 19.2.9 Let us set the following two arithmetic functions: 
e u(n) = 1 is the unit function 


e« N(n) = nis the identity function 


19.3 The Size of the Sum of Divisors Function 


For the rest of this chapter, we will focus on 0, = o itself, since the sum of 
divisors function has a deep richness of its own. We could ask questions about 
evenness, other patterns, and so forth. 

This short section asks a particularly interesting question. Try the following 
interactive cell. 


@interact 
def _(n=range_slider(1,150,1,(1,20))): 
top = n[1] 
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bottom = n[Q] 

cols = ((top-bottom)//10) +1 

T = [cols*['$n$',r'$\sigma(n)$',r'$\sigma(n)/n$']] 

list = [Li,sigma(i),(sigma(i)/i).n(digits=3)] for i in 
range (bottom, topt1) ] 


list.extend((10-(len(Llist)%10))*['','']) 
for k in range(1Q0): 
t = [Litem for j in range(cols) for item in 


List [k+10*j]] 
T. append(t) 
pretty_print(html(table(T, header_row = True, frame = 
True))) 


This table helps you see possibilities for the relative size of o(n) with respect 
to n itself. Alternately, we have the following. 


Question 19.3.1 For any given n, what is the constant C;, such that o(n) = 
C,,-n? How big can this get? 


The spread of these ratios, for n under one hundred fifty, certainly goes 
both above and below 2. If you look carefully, you will see that only one of the 
numbers above has a sum of divisors without 1 or 2 as the integer part. What 
is it? 

Instead of simply trying larger and larger input numbers, we might use a 
little theory to get a higher ratio. To wit, if a number has lots of small prime 
divisors, we might think it has lots of factors. So taking big powers of these 
would have even more small prime divisors and might get us big ratios. 


@interact 
def _(n=[1..15]): 
pretty_print (html (r"Try. 
$2°{%s}\cdot3*{%s}\cdot5*{%s }=%s$"%(n, n, n, 
2*n*3*%n*5%n))) 
pretty_print (html (r"Then_$\sigma(%s)=%s=%s\cdot. 
%s\approx.%s\cdoti%s$"%(2*n*3*n*«5*n, 
sigma (2*n*3*nx5*n), 
sigma (2*n*3*n*x5*n)/(2*n*3*n*5%n), 2%n*3%nN*5%n, 
(sigma (2*n*3*n*5*%n)/(2*n*3*n*5*%n)).n(digits=3), 
2*n*3%n*«5%n))) 


You'll notice that although we quickly get a ratio above 3 (so that a(n) > 
3n), we don’t seem to get much further. Why? 

A helpful thing to think about with this is the following rewrite, using the 
formula for o(n) with the usual writing of n = ieee Dy: 


kf pitt 
a(n) _ Hi=1 Gea _ ie Se). Il Di 
a Hy Pi" <i i Pi 1 
Based on this, we should expect this approximation to be very close when e; 


are all quite large. Then for large numbers, since ras) > 1, if we multiply by 


enough of these we will get very large numbers and so o(n)/n will be greater 
than any given C, and then o(n) > Cn. 

Of course, p = 2 is the best for this since faa = 2, but the other primes 
will hopefully be useful for this as well. For instance, n = 2!93!° will have 


2—1/2!° 3-1/3! 2 —_ 
7 7 51 ~ SLL fi 


a(n)/n= 3 
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so certainly o(6!°) will be nearly 3 - 61°. 
If we multiply it by 5 as well that should do it, and that gives the results 
we saw in the previous cell: 


2-1/2 3-1/3! 5-1/5 2 3 S93 
j=] = @=f “81 321 Fai ~ 5 


We can check out some of these ideas, and how much bigger we can get. 


print ((sigma(6%10)/(6%10)).n()) 
print ((sigma (5*6*%10)/(5*6%10)).n()) 


2.99851822943128 
3.59822187531753 


print ((sigma (2*%4*3%4*5*%4x*7) /(2%4*3%4x5%4x7)).n(digits=3)) 


4.13 


N = prod([p*4 for p in primes_first_n(100) ]) 
print ((sigma(N)/N).n(digits=3) ) 


10.9 
Continuing this for more primes suggests the following. 


Fact 19.3.2 For any positive C, there is a positive integer n such that 


a(n) > Cn. 

The argument outlined above is not completely rigorous, but is good enough 
for now. Trying to prove it this way could bring the distribution of primes to 
the table, so doing so might not be trivial. (As it happens, one can prove this 
in a very elementary way; see [E.4.5, Section 3.6].) 


19.4 Perfect Numbers 


19.4.1 A perfect definition and theorem 


Definition 19.4.1 When the ratio ala) is exactly 2, we say n is a perfect 
number. ©) 

This is a big definition, and it goes back at least to Euclid, who defines 
the notion! at the beginning of the number-theoretic books of the Elements. 
It is easy to see this is the same thing as n being the sum of all of its proper 
divisors”, which is Euclid’s characterization. Indeed, the Greek? is téJeioc 
adgwudc, which might better be translated as “complete number”, as is done in 
many languages. In modern English usage, ‘completeness’ captures the concept 
of being comprised of everything (and hence also being without flaw) better 
than ‘perfection’, but in English these numbers are universally called perfect. 

Euclid only mentions this concept again over one hundred propositions 
later, where he proves that certain numbers are, in fact, perfect. (A careful 
reader will notice that the primes in question are, in fact, the Mersenne primes 
of Definition 12.1.6!) Such a conclusion is a fitting end, as William Dunham 
says in his book, Journey through Genius [E.5.5}. 


laleph@.clarku.edu/~djoyce/java/elements/bookVII/defVII22. html 
2Historically these were called aliquot parts in this context. 
3www.claymath. org/euclid/index/book-7-definitions 
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Theorem 19.4.2 If n is a number such that 2" —1 is prime, then the (even) 
number 2”—1 (2” — 1) is perfect. 
Proof. Euclid’s proof* (in the link) of this is worth looking at. | 


Many centuries later, Euler proved the converse; we will prove them to- 
gether. (See also Chapter 1 of Dunham’s Euler: The Master of Us All |E.5.6].) 


Theorem 19.4.3 Characterization of Even Perfect Numbers. If N is 
an even number, it is perfect if and only if it is the product of a power 2”—1 
and a prime of the form 2" — 1 (for the same n). 


Proof. First, assume that 2”—1 is prime. Then the factors of N = 2”~1! (2 — 1) 
are coprime, so 


0 (20 2" = 1) =o (2) o (2 = 1) = (2 — 1) (2-141) 


The steps are because of multiplicativity and the formulas we had earlier (see 
Theorem 19.2.5) for o of powers of two and primes. But then 


(2 — 1) (2"—1+41) = 2" (2"— 1) =2[2""1 2” —1)] 


so that the sum of divisors is exactly twice the original number. 

Now for the converse, which is somewhat longer. Let us start with an even 
perfect number N, which is perforce divisible by some power of two. 

Looking ahead, call this power the (n — 1)th power! Then our even perfect 
number may be written as N = 2"~1g, where q is the (odd) quotient. 

Let’s divide the rest of the proof into several pieces. First, two facts. 


e We know that this number is perfect, so 
oO (2"-*@) =—%. gly = 2" 4 
e We also know how to compute a, so 


o (2"~*q) = 0 (2"~*) o(g) = (2" — 1) o(@) 


We can combine these observations to see that 


Note that this means 2” — 1 | q, since q is the only odd part of the left-hand 
side (implicitly using some of Theorem 6.3.2). Let’s write 


(2"—1)m=4q. 
Substituting, we have 
2” (2" — 1) m = (2" — 1) o(¢g) > 2"°m=a(q) 
Since m and q both divide gq, by the definition of o we have 
o(q) >at+me= (2"-1)m+m=2"™ 
Since these two divisors (q and m) alone add up to o(q), it must be true that q 


has exactly these two divisors, so it is prime. That means m = 1, and g = 2”—1, 
and so the perfect number N equals 2”~! (2” — 1). Great! a 


4aleph®.clarku.edu/~djoyce/java/elements/bookIX/prop1X36. html 
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We will leave the question about whether there are odd perfect numbers to 
Section 19.5. 


19.4.2 Speculation and more terminology 


There are many things people have claimed about numbers of this type. A 
Hellenistic Roman in the first century in Gerasa> named Nichomachus claimed 
that the nth perfect number had n decimal digits. Nicomachus was more 
concerned with mystical claims about perfect numbers (which many repeated, 
see [E.4.5, Chapter 3]), but this mathematical assertion continued to be made 
for over a thousand years by most commentators. However, knowing what 
we do about Mersenne primes (recall Definition 12.1.6), we see that the fifth 
possible n is 13, so that the next perfect number, 


QP _ 1) . aan 


was very large and so lay mysterious for a long time. It was apparently discov- 
ered in the fifteenth century. 


(2413-1) *2°12 


33550336 
Until the early modern period, such numbers were basically inaccessible. 
Number theorists (often of the amateur variety, but certainly not always) 
have come up with all kinds of other names for various concepts related to 


a(n)/n. 


Definition 19.4.4 Recall that if o(n) = 2n, then n is perfect. 
¢ If o(n) = kn for some integer k, then we say that n is k-perfect. 
e Or, if o(n) > 2n, then n is abundant. 


e If o(n) < 2n, we say n is deficient. 


% 

As it will turn out, these things are not really good characterizations of what 

it means to have “too many” or “too few” divisors. However, in recognition of 
the Greeks’ contributions we keep this allusive and fairly standard terminology. 
(Nichomachus is responsible for the two latter names, and they seem to have 
stuck, since medieval commentators such as Boethius waxed rhapsodic over 
them — see [E.4.5, Section 2.1].) As examples, Exercise 19.6.7 asks for a 3- 
perfect number, if one exists, and Exercise 19.6.17 asks for a 4-perfect number. 


Definition 19.4.5 Here are some less well-known, but nonetheless interesting, 
terms. 


« A number is pseudoperfect if it is the sum of some of its divisors (other 
than itself). 


e A number n is superabundant if the ratio a(n)/n for n is bigger than 
the value of the ratio for all smaller m <n. 


e A number is weird if it is abundant but not pseudoperfect. (There is a 
famous paper of Erdés on this topic.) 


% 


5Interestingly, this is the same place as one setting of the Biblical story of the demons 
called “Legion” who went into swine. 
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There are many questions one can ask about these and other definitions; 
see Exercise Group 19.6.15—21. One cheeky such question is this. 


Question 19.4.6 Is a perfect number pseudoperfect? 


19.4.3 The abundancy index 


It’s time to give a name to the mysterious ratio at the core of this section. 


Definition 19.4.7 The ratio etn) may be called the abundancy index of n. 

% 

A beautiful thing is that once you name a concept, you can ask questions 

about it. Here’s another largely open question which seems like it should be 
easy... 


Question 19.4.8 Rather than asking which integers can be gotten, which 
a(n) 


rational numbers can be gotten as 


@interact 
def _(n=(20,[1..200])): 
cols = ceil(n/10) 
T = [cols*['$n$',r'$\sigma(n)/n$!']] 
list = [Li,(sigma(i)/i)] for i in range(1,n+1)] 


List.extend((10-(len(list)%10))*L'','']) 
for k in range(10): 
t = [Litem for j in range(cols) for item in 
List[k+10*j]] 


T. append(t) 
pretty_print(html(table(T,header_row = True, frame = 
True))) 


There are some interesting theorems about this already known. For one 
thing, the abundancy index is the same thing as o_,(n). 


Fact 19.4.9 


Proof. We have that 
a(n)/n= » d| /n 
d\n 
Now note that for every d|n, the quotient is also an integer divisor d’ of n. So 
1 
a(n)/n = S- 7 
d|n 
This is the same list as the original divisor list, so reordering gives 
a(n)/n= S- oe o_1(n) 
d| d 


Fact 19.4.10 Clearly all such numbers are in the interval [1,00). Here are 
some more known facts about the abundancy indez. 


e Ifm|n, then o_1(n) > o_1(m). 
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¢ If o_1(n) = § in lowest terms, then b| n. 
e Ifr is “caught” between o(n) and n (such that n < r < o(n)) and is 
relatively prime to n, then r/n is not an abundancy index. 

Proof. We skip the proof, but proving the first two facts is left as Exer- 
cise 19.6.22. | 

Holdener and Stanton picturesquely call rational numbers which are not 
abundancies abundancy outlaws. The end of this hyper-linked paper® [E.7.11] 
has a nice list of which numbers thus far have been found, and which have 
not. 


19.4.4 Amicable Numbers 


Another interesting idea of summing divisors is still of ancient provenance, 
though not quite as old as Euclid. 


Definition 19.4.11 A pair of positive integers m,n such that a(n) = a(m) = 
m+n is called a pair of amicable numbers. >) 

Clearly any perfect number is amicable (or ‘friendly’) with itself. As with 
perfect numbers, we can characterize them as pairs of numbers whose proper 
divisors add to each other. 

The smallest pair of unequal amicable numbers is (220,284). This was 
known by the time of late Greek antiquity, where Iamblichus’ commentary 
on Nichomachus seems to be the first reference; the connection was already a 
somewhat mystical one in terms of friendship, based on the mutual summation 
to each other. Similarly to perfect numbers, some Islamic writers likewise 
cherished these in a mystical sense (see for instance [E.5.3, Section 5-3]). 

Eventually, early modern European commentators mentioned amicable num- 
bers, or at least this pair, in related contexts. See if you can find it in the follow- 
ing image’ from an appendix of sorts to the Harmonie Universelle, Mersenne’s 
monumental compendium of practical and theoretical music. 


Swww.cs.uwaterloo.ca/journals/JIS/VOL10/Holdener/holdener7.pdf 

“Courtesy of the French National Library and its online repository, Gallica at 
gallica.bnf.fr. The license does not allow for commercial use of these images. This image 
is actually a pastiche of parts of observation 13, in order not to give away the answers to 
some exercises! 
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os RPTL Obferuatione’ 


“Des parties‘ aliquotes , idé'120, ¢9 des nombrés'ainiables, °°. 
— Quantaux 2 nombres, dont les parties aliquotes feretqnt mutueliement, 
il fauc aufli metcre les nombres qui fe fuiuent depuis 2 en progreflion geo- 
Metrique, 2, 4, 8,16, &c. & puis.il faut cfcrire des nombres triples deflous, 
6,12,24548, defquelsl'yniré eftant oftée, reftents 11, 23, 47, quik 
faut mettre deffus.I! faucen fin multiplier 6 par 1,en oftar Pynité,pourauoir 
7, &12 par24, moins l’vnicé, pour produire 287 ;.& 24 par 48, moins I’vni- 
té, pour auoir nr, qu'il faut difpofer comme on les void icy, iufqu’al'infiny. 
——, Lors que I’vn desnombres du ‘detnier.ordreauec fon oppo- 
“SoM, 23, 47)! eles d : drelere bred F 
| e,&le precedent u premicr orare efot nombres premiers, 
‘2y 45 8 ’ 16. . Q ry 
F, Ton trouuera des nombres femblables a.ceux dontil eft que- 
2 Bs 243 48) ion P le, lenombre dudernier rang 1,6¢11 du pre 
71,287, 151. ion. arexemp c, cnombr¢ dugernicr rang 71, : i du pre- 
22 | mier ordre, & 5 quile precede, fontnombreés premiers. Cecy 
pote, filon multiplie 71 par 4, & femblablement 5 & 11 par le mefime 4, l'on 
aurales 2 nombres 284 & 220, dont les parties aliquotes ferefont mutuelle- 


ment. Derechef, lc nombre du dernier ordre ust eftnombre premier, aufli 
bien que fon oppof€ dansle premier rang 47, 8 le precedent 23. Il faut donc 
multiplier 16 par ust, & puis 47 &23 parle mefme16, pour auoirles 2 nom- 
bres requis 18416, 8 17296; & ainfi des autres iufques <Finfiny. 


Figure 19.4.12 Excerpts from Nouvelle Observationes regarding amicable 
numbers 

Strictly mathematical advances on this topic came from work in the Is- 
lamic world inspired by the Greek sources. Thabit ibn Qurra worked on many 
questions related to o (see Exercise 19.6.29); just as Theorem 19.4.2 is a for- 
mula of sorts, dependent upon the existence of certain types of primes, his 
Algorithm 19.4.14 may also be judged thusly. 


Historical remark 19.4.13 Thabit ibn Qurra. The ninth-century Arab 
doctor and mathematician Thabit ibn Qurra® was probably responsible for a 
number of Arabic translations of Greek mathematics in his time in the “House 
of Wisdom” of the Caliphs of Baghdad. Interestingly, he did not include a single 
example of either perfect numbers or amicable numbers, despite clearly being in 
control of effective information about them. He made important contributions 
to the question of the parallel postulate in geometry. 


Algorithm 19.4.14 Get Amicable Numbers. Here is one way to get 
amicable numbers. 


¢ Make a list of numbers of the form pp = 3-2" —1 and qn, = 9-27"! -1. 
e Then check if pn—1, Pn, and qn are all prime. 


e If so, then 2°pn—ipn and 2"qn are an amicable pair. 
Proof. Since only primes and powers of two are involved, it’s easy to calculate 
o in this case, so proving it is left as an exercise (see Exercise 19.6.21). a 
Several centuries later, al-Farisi and ibn al-Banna seem to have indepen- 
dently used Thabit’s formula to exhibit the second known amicable pair, 18416 
and 17296. Even more impressively, at the beginning of the seventeenth cen- 


8mathshistory.st-andrews.ac.uk/Biographies/Thabit/ 
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tury the otherwise obscure Persian mathematician Muhammad Baqir Yazdi? 
used this same formula to obtain the pair 9363584 and 9437056, where n = 7. 
See [E.5.11, Section 2.6] for many more details of this era. You can try the 
formula yourself in the following Sage cell. 


@interact 
def _(n=[2..20]): 
pretty_print (html ("We_have_$p_{%s }=%s$_and._ 
$p_{%s }=%S$"%(n-1,3*2*(n-1)-1,n,3*2*%(n)-1))) 
pretty_print (html ("And _$q_{%s }=%s$_as. 
well."%(n,9*2*(2*n-1)-1))) 
if is_prime(3*2*n-1) and is_prime(3*2*(n-1)-1) and 
is_prime (9*2*(2*n-1)-1): 
pretty_print(html("Then_the_pair_$%s$_and_$%s$ is. 
amicable!"%(2*n*(3*2%*(n-1) -1)*(3*2*(n)-1), 
2*n*(9*2*(2*n-1)-1)))) 
else: 
pretty_print (html ("Doesn't give.an_amicable pair") ) 


At about the same time as Yazdi, Fermat and Descartes both worked on 
this question (which is where Mersenne learned of it), and independently re- 
discovered both the formula and these pairs (see [E.5.8, II.IV]). Later, Euler 
expanded the Thabit /Fermat formula significantly and found several dozen new 
pairs. But it turns out that the next smallest pair, one everyone had missed 
by attempting to find a formula, was found by a sixteen-year old Italian boy 
in 1866! 


sigma(1184) ,sigma(1210) ,1184+1210 


(2394, 2394, 2394) 


Apparently he came up with this by trial and error, though no one knows 
for sure!°. The internet can provide some of the most current data! on these 
pairs, though sadly the best website is now out of service. The hope is that 
there are infinitely many such pairs, but there is currently no proof of this 
conjecture. At any rate, it can’t be too infinite; Nguyen and Pomerance have 
shown that, however many there are, the sum of their reciprocals is no greater 
than 215. 


19.5 Odd Perfect Numbers 


19.5.1 Are there odd perfect numbers? 


Let’s return to a question alluded to earlier -- one whose answer is still unknown 
after two and a half millennia: 


Question 19.5.1 Does there exist an odd perfect number? 
Yikes! 
We do know some things about the question. Here are some fairly easy 
facts. 


°de.wikipedia. org/wiki/Muhammad_Baqir_Yazdi 
10hsm. stackexchange. com/questions/17480/ 
web. archive.org/web/20131212143320/http: //amicable. homepage. dk/knwnc2. 
htm 
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Theorem 19.5.2 Odd perfect numbers aren’t simple. Here are simple 
forms of numbers that can’t be perfect. 


e An odd perfect number cannot be a prime power. 
e An odd perfect number cannot be a product of exactly two prime powers. 


e An odd perfect number cannot be a product of exactly three prime powers 
unless the first two are 3° and 5/. 
Proof. We leave many details to Exercise 19.6.24. The easiest way to approach 
this is by cases and subcases, using the computation from Section 19.3 that 


when n is a product of the prime powers p;‘. 


e An odd perfect number cannot be a prime power. This is easy; using 


the computation for k = 1 would require 2 = a(n) < rast Even for 


p = 2,2 < p/(p—1) isn’t possible; since we are looking for an odd perfect 
number, it definitely won’t be possible! 


e An odd perfect number cannot be a product of exactly two prime powers. 
Use the same idea, but now with the biggest possible values for odd 
primes. 


e An odd perfect number cannot be a product of exactly three prime powers 
unless the first two are 3° and 5/. This proof is slightly longer. 


o Suppose that 3 is not the smallest prime involved. Then the biggest 


that 
Pi Pai 
pi-1 po-1 p3-1 
can be is 
5 7 ll 77 
4 6 10 48 


and this fraction is still less than 2. 


o Suppose that 5 is not the second-smallest prime involved (assuming 
3 is the smallest). We again get a contradiction. 


This proof is from [E.2.8, Section 3.3A], which has even more details — including 
a full elementary proof that an odd perfect number must have four different 
prime factors! | 


19.5.2 The abundancy index and odd perfect numbers 


What is particularly interesting about this is that we can connect odd perfect 
numbers to a non-integer abundancy index in a surprising way! The connection 
below is due to P. Weiner in [E.7.14]. 

We begin with a useful lemma, which answers questions very closely related 
to Exercises 19.6.11 and 19.6.12. 
Lemma 19.5.3 If n and o(n) are both odd, then n is a perfect square. 
Proof. If n is odd, it is a product of odd prime powers. Let’s look at o as 
applied to each piece, thanks to multiplicativity. 
If a(n) is odd, then each factor 1+ p+ p? +---+ p® is odd. Such a factor of 
a(n) is a sum of odd numbers, which is only odd if there is an odd number of 


CHAPTER 19. COUNTING AND SUMMING DIVISORS 340 


them. 
Since there are e+ 1 summands, e must be even for every primes p dividing n., 
which finishes proving the lemma. a 


Theorem 19.5.4 If 3 is the abundancy index of N, then 5N is an odd perfect 
number. 

Proof. Assume this works for some N. Then 30(N) = 5N. 

Let’s look at divisors. First, 3 | N. So if N is even, then 6 | N, so by 


Fact 19.4.10, 
5 


3° 
which is impossible. If N is not even, then N is odd, so 30(N) = 5N is odd, 
which implies o(V) itself is odd. 

Since 3 | N and using Lemma 19.5.3, we see that we must have that 3? | N. 
Let’s return to the divisors. We know that 5 { N, because otherwise 


o_i(N) > a_1(6) =) 


26 5 
a_1(N) = O_j (a -5) = 15 > 3 
which is again impossible. 
Now we can compute directly that 
65 
a_1(5N) = a-1(5)o_1(N) = 53 = 2 


19.5.3 Even more about odd perfect numbers, if they exist 


Naturally, all of this is somewhat elementary; there are many more criteria. 


They keep on getting more complicated, so I can’t list them all, but here is a 
1213 


selection, including information from a big computer-assisted searc going 
on right now. 
Fact 19.5.5 An odd perfect number must (as of 2021): 

¢ Be greater than 104°°°. (The most recent announcement* says re- 


searchers have ‘pushed the computation to 102°’, and you can help try 
to factor!’ some desired numbers to help compute up to 1071°°. ) 


e Have at least 101 prime factors (not necessarily distinct). 


e Have at least 10 distinct prime factors. (This is new and relies on 
heavy computation by Pace Nielsen in Odd perfect numbers, Diophantine 
equations, and upper bounds in Mathematics of Computation'®.) 


¢ Have a largest prime factor at least 10°. 
e Have a second largest prime exceeding 10000. 


e Have the sum of the reciprocals of the prime divisors of the number between 
about 0.6 and 0.7. 


¢ Have the sum of the reciprocals of odd perfect numbers be finite (since 
the sum of the reciprocals of all perfect numbers is finite!). In fact, the 


12www. Lirmm. fr/~ochem/opn/ 

13There was another search at oddperfect.org but they seem to have let their domain lapse, 
so it is unclear whether it is still a going concern. (Search web.archive.org for the status in 
2019.) 
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sum of the reciprocals of odd perfects must be less than 2 x 10~1°° (see 
[E.7.6]), and that of all perfects is less than about 0.0205. 


e Obey the rule that if n is an odd perfect number, then n = 1 mod 12 or 
n= 9 mod 36. 
For another introduction to the problem focusing on ‘near-misses’ /‘spoofs’, 
see this article in Quanta magazine?”. 
As an appropriate way to finish up this at times overwhelming overview, 
since Euler finished the characterization of even perfect numbers, let us present 
his own criterion for odd perfects! (See also the linked article!® [E.7.19] by 


Euler expert Ed Sandifer.) 


Proposition 19.5.6 An odd perfect number must be of the form p°m?, where 
m is odd, p is prime, and p and e are both = 1 (mod 4). 


19.6 Exercises 


1. Review the proof of Fact 9.5.2 that ¢(n) is multiplicative. Can you think 
of a way to modify it directly to prove that o or oo are multiplicative? 


Exercise Group. My students discovered various facts about the functions 
in this chapter on their own; why not you? 


2. Conjecture and prove a formula for the difference between o;,(p) and 
ox (p7). (Thanks to Becca Brule and Olivia Gray.) 


3. Conjecture and prove a necessary (or even sufficient) criterion for when 
5 | o2(2k). (Thanks to Andrew Kwiatkowski and Daniel Brito.) 


4. Come up with some new (to you) conjecture about one of these func- 
tions you observed from the data, and which isn’t mentioned in this 
book. Tell what led you to this conjecture. 


5. Read Euclid’s original proof!® that certain even numbers are perfect and 
write it down in modern notation. 

6. Do you think perfect numbers as defined in Definition 19.4.1 should be 
called perfect? Why or why not? Establish a connection to GIMPS. 


7. Please find a number such that a(n) = 3n. (This was apparently first 
done in Robert Recorde’s Whetstone of Witte in 1557, where we also find 
the equals sign for the first time.) 


8. Could there be a function g(n) which is multiplicative, where g(2n) = 0, 
g(n) =a, =1ifn=1 (mod 8), g(n) = ae if n = 3 (mod 8), g(n) = az if 
n = 5 (mod 8), and g(n) = aq if n = 7 (mod 8)? 

9. Let 7,(n) and o,(n) be the same as 7 and o but where only odd divisors of 
n are considered; let T- and o, be similar for even divisors of n. Evaluate 
these functions for n = 1 to 12, and decide whether each of them is mul- 
tiplicative or not (either proving it, or showing not by counterexample). 

10. Use the estimate toward the end of Section 19.3 for o to find numbers for 
which o(n) > 5n and a(n) > 6n. (Possibly long.) 


l4www. Lirmm. fr/~ochem/opn/ 

15 www. Lirmm. fr/~ochem/opn/mwrb210@. txt 

16 www. ams .org/journals/mcom/2015-84-295/S@@25-5718-2015-02941-X/ 

17www. quantamagazine. org/mathematicians-open-a-new-front-on-an-ancient-number-probLem-20200910/ 
18eulerarchive.maa.org/hedi/HEDI-2006-11.pdf 

19 alephO.clarku. edu/~djoyce/java/elements/bookIX/prop1X36. html 
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11. 


12. 
13. 
14. 


Discover and prove conditions for which 7(n) and a(n) are even and odd 
numbers. 

Show that if m is odd then 7(n) and o(n) have the same parity. 

For which types of n is T(n) = 4? 

Prove that if m = 7 (mod 8), then 8 | a(n). 


Exercise Group. Here are facts about various definitions beyond perfect 
numbers in Subsection 19.4.2. 


22. 


23. 


24. 


25. 


26. 


15. Show that every prime power is deficient. 
16. Show that a multiple of an abundant number is abundant. 
17. Find a 4-perfect number. 
18. Compute “by hand” o_, for the numbers up to 30. Come up with 
and prove a criterion for when o_; = 2. 
19. Find three pseudoperfect numbers less than 100. 
20. Find a weird number less than 100. 
21. In the proof of Algorithm 19.4.14, confirm that if pp, pn—1, and dn 
are prime, then the numbers in question are amicable. 
Prove the first and second facts about the abundancy index in 
Fact 19.4.10. 
Find five numbers that must be abundancy outlaws based on the facts 
(don’t just copy from the list). 


Fill in the details in the proof of Theorem 19.5.2 (that odd perfect numbers 
need at least three prime divisors, and that 3 and 5 would need to be the 
first two if there were exactly three). 


Read the article linked right after Fact 19.5.5 about Euler and odd perfect 
numbers, and restate and reprove his criterion in modern notation. 


There are always more connections. Here are some activities about a 
formula one would have likely never guessed: 


2 


d|n d|n 


First, test it out by hand with n = 6 and n = 8. Then try it with bigger 
numbers below: 


@interact 
def _(n = 24): 
divs = divisors(n) 
pretty_print (html ("The divisors_of_$%s$_are. 
$%s$"%(n,divs))) 
pretty_print (html ("And_$\\tau$_of each_of_them_is. 
$%s$"%(Lsigma(div,®) for div in divs]))) 
pretty_print (html ("The sums_of_the_cubes.and.the, 
square_of_the_sum_are_$%s$_and_$%s$ ,u 
respectively! "%(sum(Lsigma(div ,®)*3 for div in 
divs]),sum([Lsigma(div,®) for div in divs])%*2))) 


Start a proof by noting that it’s clearly true for a prime power n = p*, 
for which t(p/) = f +1, and all divisors of n look like such a power of p. 

Continue the proof by examining the proof of Theorem 19.2.7 for what 
can be said about the divisors of mn, and how a sum over divisors d | mn 
can be a product of two different sums over divisors of m and n. 
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27. 


28. 


29. 


30. 


Use Theorem 19.4.3 to show that the even perfect number is actually the 
sum of the positive integers up through its involved Mersenne prime p. 
(This is actually true for any number of this form in the theorem, but 
the theorem guarantees that any even perfect number has this form! See 
[E.7.37] for the interesting corollary that every even perfect number ends 
in 6 or 28.) 


Don’t read this exercise before you do Exercise 19.6.7! (A reading knowl- 
edge of French is presumed.) 

Mersenne eventually published a method of Fermat’s for finding 3- 
perfect numbers, some time after a “heated exchange of letters” among 
several mathematicians (including Descartes) about multiply perfect num- 
bers (see Vittoria Boria’s 1989 dissertation, Marin Mersenne: Educator 
of Scientists). Use the following image (as usual, courtesy BNF/Gallica) 
to recreate his method and obtain another 3-perfect number. Warning: it 
might be pretty large! 


ie i oe ae H : ey ees ae . ooh ar Bie at 
“Des parties aliquotes ,'de'120,, ¢* des nombres'ainiables, °C 


gL fautajotiter 4 ce que fay dit des parties aliquotes desnombres 


PIIKES fafdir, U faut donc.mecere tantde nombres de fuite quion-vou- 
‘dra en raifon double, en commengant parz,commic font les nombres A,B, 
G,DE, EF, defquels!'vnité eftantoftée, l'on face les nombres\G,H; .L,M, 
8s aufquelsl'vniré eftancajotirée on face les autresnombyes N,O;P;Q;R,S. 
CELE Li] Lome I'vir des nombres G,H, 1, K, LM, parexeniple, 
GLH LK, LM. i, dinifé pat le nombre N,,du dernier-otdre:eloigné de 
7> 15> 31s 63s | 4 rangsa main gauche, produiravn nombre préniier, le 
A,B,C,D,E, F. triple de ce nombre premier multiplié parle nombre du 
23-4y°8,16; 32,64, | rang.du milied, qui precédé K immiediatement, donntra 
N,0, P,Q, R; S- | Je nombre requis:comme l'on void en 15 dinifé par3,d’ou 
3s $s 9217) 33» 65+! vient nombre premier dontle triple 15 multiplié par 8, 
fait 120, quieftle nombre que nousauons donné dans la Preface fufdite. 
Lrautre exemple fevoid en 63; lequel diuilé par 9, produitle nombre pre- 
mier 7, dont letriple 21 multiplié par 32, fait 672, qui eft l'autre ndbre requis, 


Figure 19.6.1 Excerpt from Nouvelle Observationes regarding triply per- 
fect numbers 


Find an odd abundant number by multiplying a bunch of distinct odd 
numbers. Then do some historical research to find out whether de Bou- 
velles?°, was the first person to find one, in 1510, whether [E.4.5, Section 
3.6] is correct that he did it, but in 1509, or whether ibn Tahir Al-Baghdadi 
actually did it first in the eleventh century. 


Suppose that, as in Theorem 19.4.3, you have a power of the form 2”~!. 
Whether or not 2” — 1 is prime, one can still investigate what happens 
when we multiply 2”~' by prime numbers p greater than or less than 
2” — 1. Is this number now deficient or abundant? What is the value of 
a(n) — 2n? 

Investigate this question for n = 3 and n = 5, each time with two 
primes greater than and less than 2” — 1. There is a consistent answer, 


20mathshistory.st-andrews.ac.uk/Biographies/Bouvelles/ 
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31. 


and even a formula in terms of p and n. (See [E.5.11, Section 2.5] for 
Thabit ibn Qurra’s and Muhammad Yazdi’s discoveries along these lines.) 


Consider the function f(n) = a(n) — n. One can ask the question of 
whether there are two different positive numbers n that give the same 
output for f; such numbers can be called balanced or equal weight. Find 
equal weight numbers which both have f(n) = 17. Can you find ones 
with f(n) = 57? (See [E.5.11, Section 2.6] and [E.7.45] for al-Baghdadi’s 
and Yazdi’s discoveries along these lines, as well as Dickson’s voluminous 
history.) 


Summary: Counting and Summing Divisors 


This chapter investigates the surprisingly wealth of questions arising from 
one of the oldest arithmetic functions. 


1. 


We first define o(n) in Definition 19.1.1, and encourage a lot of explo- 
ration! 


. The next section proves a number of important facts about these sums, in- 


cluding multiplicativity as a corollary of the quite general Theorem 19.2.7. 


. Section 19.3 explores the size of the sum of divisors function. 


. We next turn to a Characterization of Even Perfect Numbers. There are 


many interesting definitions here, and we even discuss an ancient way to 
Get Amicable Numbers. 


. Finally, we learn not only that Theorem 19.5.2, but that no one really is 


sure whether they exist at all! 


There is a very broad variety of Exercises looking at all the definitions, and 
their variations, related to summing divisors, ending with some interesting 
historical ones. 


Chapter 20 


Long-Term Function Behav- 
ior 


We will now move on to think of these same functions in a different way from 
the previous chapter. We will examine different limits in number theory, and 
how integrals and calculus are inextricably bound up with this sort of question. 

If, after this chapter, you are interested in more of this kind of material, 
definitely check out! Stopple’s excellent [E.4.5], to which I am indebted for 
many of the ideas here, or the more challenging book |E.4.6] by Apostol. 

Finally, note that some proficiency in calculus is helpful in understanding 
the results in this chapter, though a proper course is not necessarily a prereq- 
uisite. 


20.1 Sums of Squares, Once More 


Our motivational example will be the one we discussed in Section 18.1. Recall 

that r(n) denotes the (total) number of ways to represent n as a sum of squares, 

so that r(3) = 0 but r(9) = 4 and r(5) = 8. Then we saw in Fact 18.2.9, more 
or less rigorously, that 

1 

lim — S- r(k) =. 


noo nN 
k=1 


20.1.1 Errors, not just limits 


As it happens, we can say something far more specific than just this limit. 
Recall one of the intermediate steps in our proof. 


ac \2+3) Ly er (14/244) 


IA 


k=0 
Notice that if I subtract the limit, 7, from the bounds, I can think of this in 
terms of an error. Using absolute values, we get, for large enough n, 


lTwo other books with useful presentations are the terse one in [E.2.9] and the more intu- 
itive, if shorter, one in [E.2.11]. [E.2.8, Section 3.8] has a deep but idiosyncratic presentation, 
as evidenced by its starting with what we give as Proposition 24.6.7! 
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where the value of C is not just 7/2, but something a little bigger because of 
the = term. 

In the next two cells we set up some functions and then plot the actual 
number of representations compared with the upper and lower bound implied 
by this analysis. We include a static image at the end, but encourage you to 


explore. 


def r2(n): 
n = prime_to_m_part(n, 2) 
F = factor(n) 
ret = 4 
for a,b in F: 
if a%4==3: 
if b%2==1: 
return Q 
else: 
n = prime_to_m_part(n,a) 
else: 


ret = ret * (bt1) 
return ret 


def L(n): 
ls = [] 
out = @ 


for i in range(1,n+1): 
out += r2(i) 
ls.append((i, out/i)) 
return Ls 


@interact 

def _(n=100): 
P = Line(L(n)) 
P += plot(pitpixsqrt(2)/sqrt(x),x,3,n,color='red') 
P += plot(pi-pixsqrt(2)/sqrt(x),x,3,n,color='red') 
P += plot(pi,x,3,n,color='red',linestyle='--') 
show(P) 


Figure 20.1.1 Error bounds for average of sum of squares 


Note that the actual number is well within the bounding curves given by the 
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red lines, even for small n. This shows a general rule of thumb that, typically, 
the constant we prove will be a lot bigger than necessary. New research is 
about improving such bounds. 


20.1.2 Landau notation 
It turns out there is a nice notation for how ‘big’ an error is. 


Definition 20.1.2 Big Oh. We say that f(x) is O(g(x)) (“eff of eks is Big Oh 
of gee of eks”) if there is some positive constant C and some positive number 
xo for which 

|f(x)| < Cg(ax) for all x > xo. 


This is known as Landau notation. ©) 
See Exercise Group 20.6.1—5 for some practice with this. In practice in 
this text, we will focus on C' and elide details of xp unless it is crucial to the 
narrative. 
Example 20.1.3 The average number of representations of an integer as a 
sum of squares is 7, and if you do the average up to N, then the error will 
be no worse than some constant times 1/VN. So the sum’s error is Big Oh of 
1/VN, or O(a-/?). 
It is unknown in this case just how small the error term really is. In 1906 it 
was shown that it is O(x~?/*) (note that this is a more accurate statement, see 
Exercise 20.6.5). See Figure 20.1.4 for a visual representation, where C = 7. 


4.57 


0 20 40 60 80 100 
Figure 20.1.4 Better bound for average of sum of squares 


It is also known that the error term is not as close as O(a~*/*); see [E.7.25 
for much more information at an accessible level. 


Now let’s apply these ideas to the divisor summation functions 7 and o from 
Definition 19.1.1 in the previous chapter. (We will use these common alternate 
notations — 7 for oo and o for a, — from Remark 19.1.2 throughout this chapter.) 
Namely, consider the following interesting question. 


Question 20.1.5 What is the “average” number of divisors of a positive inte- 
ger? What is the “average” sum of divisors of a positive integer? 


It turns out that clever combinations of many ideas from the course as well 
as calculus ideas will help us solve these questions! We will start with 7 in 
Section 20.2, and address o starting in Section 20.4. Finally, answering these 
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questions will motivate us to ask the (much harder) similar questions one can 
ask about prime numbers, starting in Chapter 21. 


20.2 Average of Tau 


20.2.1 Beginnings 


Let’s begin by observing Figure 20.2.1, which plots the average for 7 up to 
n = 100. 


4.54 


1.54 
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Figure 20.2.1 Average of T, number of divisors 


Sage note 20.2.2 Try to be efficient. Observe the following two cells. The 
first cell records the successive sums of 7 in a variable out (for ‘output’), so that 
we don’t have to recalculate the entire sum each time we compute the average 
value for a different input value. We record the actual averages sequentially in 
a separate list Ls. 

Then the interactive cell is very simple indeed. Try being efficient in your 
programming! 


def L(n): 
ls = [] 
out = Q 


for i in range(1,n+1): 
out += sigma(i,Q) 
ls.append((i, out/i)) 
return ls 


@interact 

def _(n=100): 
P = Line(L(n)) 
show (P) 


These graphics shows how the average value of 7 up to n changes as we let 
n get bigger. This isn’t enough data to tell whether there is a limiting value 
for the average value of 7(n), even if you look out to the first 1000 integers, 
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but it’s suggestive. Part of the unpredictability is from primes; every prime 
number contributes just 2 to the total (and so reduces the average value)! 

Nonetheless, thinking about this might lead us to look a little deeper. For 
example, the ‘trend’ is concave down. So let’s look at comparing it with various 
concave down functions. (The following interact supports multiplied constants 
with them as well.) 


@interact 
def 2(n=100F C=. 5) f=hxCI/ 2) x xX Ol/ 30 X2C1/4)). Logic), 
LOCOCO), <? Clee), 2 1p)2 
rGO) = 7 
P = line(L(n),legend_label=r 'average_of_$\tau$') 
P += plot(C*f,(x,1,n), color='black', Linestyle='--', 
Legend_Label='$%s%s$'%(RDF(C), Latex (f(x)))) 
show (P) 


At the very least I can estimate that the average value is Big Oh of a certain 
function. But how does it go on? 


144 
124 


104 


0 2e5 4e5 6e5 Be5 126 
Figure 20.2.3 Average of 7 to one million 


In Figure 20.2.3 we have our graph of averages of T(n) versus n, out to 
one million. Certainly this looks akin to some fractional exponent function. 
On the other hand, it seems to grow more slowly than /z = «!/?, our initial 
estimate in the interact, so if it is, the exponent must be pretty small. (If you 
are familiar with semilog or log-log plots and are willing to look up how to do 
them in Sage, see Exercise 20.6.7 and then try to plot this on those axes.) 


20.2.2 Heuristics for tau 


We'll start with a heuristic, going right back to the sieve of Eratosthenes. 
In that algorithm (6.2.3), we proved that in order to test whether n is prime, 
you just have to check all numbers up through \/n. This is because any divisor 


Vn <d <n implies the existence of a divisor % such that 


nm nm 


ay ae a. 


So the absolute most number of divisors possible (for a given n) is if every 


n 
l=—< 
n 
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7 


number d less than \/n was a divisor, and then all the 4 > \/n you get were 
also divisors. 

This is a silly idea beyond such small n, but let’s go with it anyway. Even 
if all those divisors were there, you would have 7(n) = 2|./n| < 2\/n so that 
T(n) is O(./n). 

Example 20.2.4 For n = 24 this idea is actually true. We can line these up in 
pairs as (1,24), (2,12), (3,8), (4,6), and that gives 2-|/24| = 8 total divisors. 


That estimate is very important! It means we can get a sense of a first 
bound on the average value of 7. At the very least we have that 


nts 1S ovE. 


=1 Os 1 


20.2.3 Using sums to get closer 


Let’s rewrite this inequality in a more suggestive form by noting k = n(k/n): 
. et T(k) < a —2\/n(k/n) 
n 
k=1 
1 


This form looks an awful lot like a Riemann sum with x = k/n and Agr = -. 


Sais 1 . 
To review, recall writing a Riemann sum? for fj x? da in the form 


1 (: ) eal (?) 1 ny? 
| peti ) 
n\n n\n n\n 
(If you need a calculus refresher, there are several great free calculus texts in 


the American Institute of Mathematics list of approved textbooks®.) 
Doing the same type of summation for the function 2,/nz would give 


Se ~2/n{k]n) © [vaste =2vn Vi de = Syn. 


k= i” 


That certainly suggests that the average of 7 might be O(./n) with C = 4/3. 
To make this rigorous, we will need to make a slight change of point of view 

in order to ensure it will be viewed as a left-hand sum of an increasing function 

(and hence the Riemann sum is less than the actual value of the integral). 
Namely, consider that 


5 v= > (5 avert =" (3 ) avmterart s f° avn FT de 


( 1 ) 3/2 ( 1 ) ") 
1+—- -{[- . 
n n 
The big extra factor on the right can be shown to be decreasing as a function 


of n (using derivatives), and hence is always less than 2 for positive integers 
(plug in n = 1 to see), so the entire expression will always be less than 8/n. 


This integral evaluates to 


Ws 


2activecalculus.org/single/sec-4-2-Riemann. html 
3aimath. org/textbooks/approved-textbooks/ 
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Thus one can write 


n 


y rx “Sl 2VE< svn 


k=1 k=1 


Sle 


so that the average value is bounded by a constant times ,/n and is hence 
O(./n). This implies, perhaps, that the average number of divisors goes steadily 
up! (If so, it guarantees that the trend is, on the whole, concave down.) 


20.2.4 But Big-Oh isn’t enough 


However, we might also want to know what the average value of 7 is. The 
preceding subsections only tell us what it’s less than! In the next interact, it 
seems that it’s hard to find the “right” value of C so that the average value 
would be the same order as \/n. 


def L(n): 
ls = {tJ 
out = @ 


for i in range(1,n+1): 
out += sigma(i,Q) 
ls.append((i, out/i)) 
return Ls 


P = Line(L( 1000000) ) 
@interact 


def _(a=.02,n=2): 
show(P + plot(axx*(1/n), (x,1,10%6), 


color='red', lLinestyle='--')) 
pretty_print (html (r"Blue_is_the_average_value_of,, 
$\tau$")) 


pretty_print (html ("Red is $%sx*{1/%s}$"%(a,n))) 


Try x'/3 in the interact; it doesn’t seem to make matters any better. 

We don’t even have to stop with the average; one can directly show that the 
unadorned function t(n) = O(%/n). Here are the steps one might take. We 
make fleshing out the details Exercise 20.6.10 (adapted from [E.4.5, Section 
8.8)): 


e First, note that 7 is multiplicative. 
¢ For a given prime p, note that 7 (p*) = « + 1 grows much more slowly 
than (pt)! % — p?/3, which is exponential in x. 


o What value do each of these have at x = 0? 


o Take derivatives of both functions at x = 0 to show that the growth 
statement is definitely true for p > 23. 


o Show that for each prime p less than 23 there is an x, such that the 
growth statement is true after rp. 


¢ Put these pieces of information together to show that T is O (x1/3). 


There is nothing special in this section about «!/? or x!/ (other than easier 
calculations). Any 2!/” will do. 
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20.3 Digging Deeper and Finding Limits 


So where does the number of divisors function go? To answer this, we will look 
at a very different graph! 

The fundamental observation we will use is that r(n) is precisely the same 
as the number of positive pairs of integers (x,y) such that zy = n. Before 
going on, spend some time convincing yourself of this. 

Then, if we translate xy = n to a graph of y = n/a and (,y) to a lattice 
point, we get the visualization* in Figure 20.3.1. 


Figure 20.3.1 Lattice points and hyperbolas 


20.3.1 Moving toward a proof 


To be more in line with our previous notation, we will say that 7(n) is exactly 
given by the number of positive integer points (d, 2) with the property that 
d% =n. Now we can interpret )>;_, 7(k) as the number of lattice points on 
or under the hyperbola y = n/a. 

This is a completely different way of thinking of the divisor function! We 
can see it for various sizes in the interact below. 


@interact 
def _(n=(15, list(range(2,50)))): 
viewsize=nt1 
g(x)=1/x 
P=Graphics() 
P += plot(n*g,(x,@,n+1)) 
P += plot(2*g,(x,0,n+1), Linestyle="--") 
if n>7: 
P += plot((n-5)*g,(x,0,n+1),linestyle="--") 
grid_pts = [Li,j] for i in [1..viewsize] for j in 
[1..viewsize]] 


4See texts such as [E.4.5] or [E.2.11], though probably I like [E.2.11, Figure 15-5] best as 
inspiration since it includes several of the curves at once as I do here. 
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P += points(grid_pts,rgbcolor=(0,0,0) ,pointsize=2) 

lLattice_pts = [coords for coords in grid_pts if 
(coords[@]*coords[1]<=n) ] 

P += points(lattice_pts, rgbcolor = (0,0,1),pointsize=20) 

show(P, ymax=viewsize , aspect_ratio=1) 


So what we will do is try to look at the lattice points as approximating an 
area! Just like with the sum of squares function (recall Subsection 18.2.3 and 
Section 20.1), we will exploit the geometry. For each lattice point involved in 
yr-17(k), we put a unit square to the lower right. 


Figure 20.3.2 Lattice points, hyperbolas, and squares 


In examining this graph, we will interpret the lattice points as two different 
sums. 


¢ We can think of it as )>/_, 7(k) — adding up the lattice points along each 
hyperbola. 


e We can think of it as De 2], or adding up the lattice points in each 


vertical column. 


The area of the squares can then be thought of as another Riemann-type sum, 
similar to our summation of T. 
It should be clear that the area, an estimate for the sum, is “about” 


= nlog(n) — nlog(1) = nlog(n) 
1 


i 7 de = nlog(x) 
1 @& 


where the logarithm is the ‘natural’ one. 


Definition 20.3.3 Throughout this text we use log(n) to mean the natural 
logarithm with base e. % 


See [E.4.5, Figure 4.3]. 
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Why is this integral actually a good estimate, though? The answer is in the 
error! 


Figure 20.3.4 Lattice points, squares, and error 


Look at the shaded difference between the area under the curve (which is 
nlog(n)) and the area of the red squares (which is the sum of all the 7 values). 


e All the areas where the red squares are above the hyperbola add up to 
less than n, because they are all 1 in width or less, and do not intersect 
vertically (they stack, as it were). 


e Similarly, all the areas where the hyperbola is higher add up to less 
than n, because they are all 1 in height or less, and are horizontally 
non-intersecting. 


(Actually, we would expect they would cancel quite a bit ... and they do, as we 
will see. We don’t need that yet.) 

I find these points to be easier to see if you try a few different options in 
the interact below. 


@interact 
def _(n=(8, list(range(2,25)))): 

viewsize=nt1 

g(x)=1/x 

P1 = Graphics () 

P1 += plot(n*g,(x,0,n), ymax=viewsize, aspect_ratio=1, 
xmin=0, xmax=n+1) 

P1 += plot(piecewise([L(j,j+1),floor(n/j)] for j in 
EtcooW—Wi), CGcol pW), willeanm/xe, Palle loca. 3, 
LiMestviles!™)) a PLOEC Ceo W Wr) ,rulleririe - 
fillalpha=.3,linestyle='') 

P2 = plot(nxg,(x,0,n+1), ymax=viewsize, aspect_ratio=1) 
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P2 += plot(n*g,(x,1,n),fill=True, fillalpha=. 3) 

grid_pts = [[i,j] for i in [1..viewsize] for j in 
[1..viewsize]] 

P1 += points(grid_pts,rgbcolor=(0,0,0),pointsize=2) 

P2 += points(grid_pts,rgbcolor=(0,0,0),pointsize=2) 

Lattice_pts = [coords for coords in grid_pts if 
(coords[@]*coords[1]<=n) ] 

P1 += points(lattice_pts, rgbcolor 
(@,®,1),pointsize=20) 

P2 += points(lattice_pts, rgbcolor 
(@,®,1),pointsize=20) 

squares=[line([Lk,l],Ck+1,l],0k+1,l-1],C0k,l-1],0k,l]], 
rgbcolor=(1,0,0)) for [k,l] in lLattice_pts] 

for object in squares: 
P1 += object 
P2 += object 

show(graphics_array([P1,P2])) 

pretty_print (html (r"Error_between_sum_of_$\tau(n)$ up. 
through. $%s$,_and_$%s\log(%s)$"%(n,n,n))) 
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We can summarize this discussion in the following three implications. 


Fact 


20.3.5 


The error \y;_,7(k) — nlog(n) is a positive real number less than n 


minus a (different positive real) number less than n. 


So the error is certainly O(n) (less than some multiple of n as n gets 


huge). 
So, the error in the average is less than some constant as n gets huge! 
Le., 
1 n 
~ > 7(k) — log(n) = O(1) 
k=1 


(Recall we use log(n) to mean the natural logarithm.) 


We can verify this graphically by plotting the average value against log(n). 
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—Average of r(n) a 
= log(n) ae 


—Average of r(n) 
- log(n) 
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Figure 20.3.6 Average of 7 versus log 


Lookin’ good! There does seem to be some predictable error. What might 
it be? Drawing inspiration from [E.4.5, Figure 4.5], we plot it: 


1+ 14 
084° 0.8 4 
064° 0.6 4 
0.44. 0.44 
me, i 
0.2 4 eT er 0 4 2 
T T T T T T T T T T T 
20 40 60 80 100 500 1000 1500 2000 2500 3000 


Figure 20.3.7 Error of 7 versus log 


Observe Figure 20.3.7. Keeping « = O in view, the error seems to be 
somewhat less than 0.2, although it clearly bounces around a bit. The long- 
term value seems to settle roughly between 0.15 and 0.16, as x gets large. So 
will this give us something more precise? 
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20.3.2 Getting a handle on error 


To answer this, we will try one more geometric trick. 


Figure 20.3.8 Lattice points, 7, and symmetry 


357 


Notice we have now divided the lattice points up into three parts, two of 


which are ‘the same’: 
e The ones on the line y = x. 
e The lattice points above the line and below the hyperbola. 


e The lattice points to the right of the line and below the hyperbola. 


Try it interactively, and perhaps see if there is a formula for how many of each 


type there are. 


@interact 
def _(n=(8, list(range(2,25)))): 
viewsize=nt1 
g(x)=1/x 
P=Graphics() 
P += plot(n*g,(x,0,n+1)) 
P += plot (2*g,(x,0,n+1),lLinestyle="--") 
if n>7: 
P += plot((n-5)*g,(x,0,n+1),linestyle="--") 
grid_pts = [[i,j] for i in [1..viewsize] for j in 
[1..viewsize]] 
P += points(grid_pts, rgbcolor=(0,0,0),pointsize=2) 
Lattice_pts = [coords for coords in grid_pts if 
(coords[@]*coords[1]<=n) ] 


P += plot(x,(x,@,viewsize), 
Linestyle="--",rgbcolor=(@,0,0)) 
show(P, ymax=viewsize , aspect_ratio=1) 


P += points(lattice_pts, rgbcolor = (0,0,1),pointsize=20) 
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Now let’s count. First, there are exactly |./n| < /n points on the line. At 
each integer y-value d up to y = \/n, there are are |n/d| — d to the right of 
the line and below the hyperbola. Analogously, at each integer x-value d up 
to x = \/n, there are are |n/d| —d points to the left of the line and below the 
hyperbola. (These numbers are all nonnegative since d < \/n.) 

Combine these computations as sums over the divisors d less than n and 
remove the floors to get an easier approximation: 


dork) = Ln] + $0 (Ln/d]—d) + S5 ((n/d|—d) < Vnt+2 YO (n/d-d). 
k=1 dsvn d<vn d<vn 
Because the floor of any number is less than the number itself by at most one 
for each d, the total error gained using this inequality is at most the number 
of terms in the sum, or 1 + 2\/n = O(,/n). 

Next we rewrite this using the formula for the sum of the first @ integers 
(Example 1.2.4), using @ = |,/n| and subsuming all the \/n pieces: 


y x(k) Sn S- 472 d> d+ 0(vn) 


k=1 d<vn d<vn 
1 n n 1 
=o 2( dee + O(Vin). 


Once® n > 4, the absolute value of the difference between 5 and (4 (Ln +1) 


is O(./n) (with C = 1/2, in fact), so using some of the work in Exercise 
Group 20.6.1-5 we finally get that 


n 


So r(k) = 2n S- U—n+0(vn) = 


k=1 d<vn 


* So r(b)=2 D> 5-14 0(1/ve). 


k=1 d<J/n 


20.3.3 The end of the story 


We’re almost at the end of the story! It’s been a while since we explored the 
long-term average of 7 in Subsection 20.2.1; at that point, you likely convinced 
yourself that log(n) is close to the average value of r. 

So now we just need to relate the sum 2) J g< iq + —1 to log(n). I wish to 
emphasize just how small the error term O(1/ Vn) is! 


6 These computations are just one of the many places where George Jennings caught subtle 
inaccuracy or incompleteness in wording, which has improved the text greatly. 
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Figure 20.3.9 Difference between harmonic series and log 
Figure 20.3.9 shows the exact difference between ~. z and log(m). Clearly, 
even as m — oo, the total area is simply the sum of a bunch of nearly-triangles 
with width exactly one and no intersection of height (again this idea), with 
total height less than 1. So the difference between are z and log(m) will be 
finite as m — oo. 


This number is very important! First of all, it clearly is related to the 
archetypal divergent series from calculus, the harmonic series 


ioe) 
k=1 


However, this constant has taken on a life of its own. 


ale 


Definition 20.3.10 The number ¥, or the Euler-Mascheroni constant, is de- 


fined by 
m-1 
. 1 
y= im & k _ to) 


k=1 
% 


Remark 20.3.11 You have almost certainly never heard of this number, but 
it is very important. There is even an entire book, by Julian Havil [E.4.15] 
about this number. It’s a pretty good book, in fact! 

Among other crazy properties, —7 is the derivative (at « = 1) of a general- 
ized factorial function, called Gamma (T). Iam not making this up. 

Most baffling of all, 7 is not known to be either rational or irrational. Maybe 
you will solve this mystery? 

Consider the area corresponding to 7 compared to its finite approximations. 
Notice that the “missing” part of the area (since we can’t actually view all the 
way out to infinity) must be less than 1/m, since it will be the part lower than 
all the pieces we can see in the graphic for any given m. So ¥ is within O(1/m) 
of any given finite approximation _ i —log(m). Adapted to our context, 
we have 


> 5 = los (vi) +7+0(1//n). 
devin 
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Now we put it all together! We know from above that 
LS 0 S- 140 (1/Vvn): 
omar d 


d<vVn 


Further, we can substitute for )>j- Vana as in our discussion of y, and then 
take advantage of the log fact that 2log(z) = log (z?). Then we get 


slr 


n 


S" r(k) = log(n) + (27 — 1) +O (1/V/n) . 


k=1 


That is exactly the asymptote and type of error that I have depicted in Fig- 
ure 20.3.12! 
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Figure 20.3.12 Reassessing the error in T 


You can see this is a fairly sharp result. (It’s even possible to show that the 


error in the average is O(1/%/z), but is not O(1/*/z); once again see [E.7.25] 
for much more information.) 


20.4 Heuristics for the Sum of Divisors 


20.4.1 Numbers instead of points 


Could this type of argument conceivably be used for 0g = a1? 


The answer is yes! Consider the following rewrite of the sum of sigmas, 
which are themselves the sum of divisors: 


220) = Dy duae Dy 


=Le 


q.d such that qd<a d<xqsg 
We have changed from a sum of sums of divisors (which might not be con- 


secutive, and makes o annoying to compute) to a sum of sums of consecutive 
integers" 


7Most proofs of the ideas in this section are quite terse, which was inappropriate for my 
students; I have drawn from [E.4.5, Chapter 4.4], [E.2.9, Section 22], and [E.4.6, Theorem 
S.A). 
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We can think about this graphically again. Instead of comparing points on 
a hyperbola with points in columns or rows, though, we will compare numbers 
at points on a hyperbola with numbers at points in rows. We can think of it 
as summing up a weighted set of points. Consider Figure 20.4.1. 


Figure 20.4.1 Labeled lattice points for a 


Example 20.4.2 In Figure 20.4.1 we see (by following hyperbolas xy = n, up 
through the graphed one xy = 6) that 


SJ o(k) =14+(1+2)+ (143) 4+ (14244) +(1+5) + (1424346). 


6 
k=1 


Then we can rearrange this to go along rows instead as 


(14+24+3+44+4+5+46)4+(1+24+3)4+(1+2)4+14141, 


which means we can think of it as a sum of sums from 1 to the length of each 
row. 

Use the following interact to confirm that each row is, [2 | in length, as 
with rT. 


@interact 
def _(n=(6, list(range(2,50)))): 
viewsize=nt1 
g(x)=1/x 
P=Graphics() 
P += plot(n*g,(x,®,n+1)) 
grid_pts = [[Li,j] for i in [1..viewsize] for j in 
[1..viewsize]] 
P += points(grid_pts,rgbcolor=(0,0,0) ,pointsize=2) 
Lattice_pts = [coords for coords in grid_pts if 
(coords[@]*coords[1]<=n) ] 
for thing in lattice_pts: 
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P += text(thing[0], thing, rgbcolor=(0,0,0)) 
show(P, ymax=viewsize , aspect_ratio=1) 


Let’s take stock of the graphic and o. 


e Each row has | =| integers. 


e Adding up the first 7 integers (from one to j) has formula 
1) a 


2 ~ 2979 


(recall again Example 1.2.4). 
e« The most wrong Lil) can be from 1) is 7 +1 = O()J) (this is 
simple algebra). 
If we combine all this information, we get 


Som=O Ded [FLEl + ELS] 


n<x d<aq<5 d<ax 


SIHORHC RICE 


20.4.2 Order calculations and more 


But this is actually possible to analyze! First, we perform some order calcula- 
tions. 
We already saw that )>j-,, 4 = log(x) + O(1), so 


s : (2) = 50 log()) = O(elog(sx)). 
d<a 


(See Exercise 20.6.15.) Also, )’g<, 0 (5) must be 


1 
O{cS-> q | = Oe log(a)). 
d<a 
Next, let’s get more information about )7)j<, [3 (3)’]. Recall that the 
(convergent) improper integral tte 4 approximates )7 4... a. 
Since both converge, and by the same pictures as above, the error is cer- 
tainly O(1/x?). Then I can rewrite things as 


ted i 
Le LeB ae 


d<a d>a 
= aa nee 1/x?) = ee ee 1/x?). 
= 2 | py + O(1/2") » (=) _ + O(1/2") 


Thus the whole crazy double sum can be approximated as follows, quite 
accurately: 


S > a(n) = s S- (=) | ay + O(a log(z)) 
<a d<a 
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_ es bs (<:) _ - + 04/2) £O(e lox) 
d=1 
= e > (=) 5 + O(a log(z)). 


d=1 


And the average value of o must be this divided by x, namely 


‘ ¥ a(n) is 5 > _ Otis): 


Since we know that the series converges, this means the average value of a 
increases quite linearly, with an error (at most) increasing logarithmically! This 
might be a shock — that one could actually get something fairly accurate like 
this relatively easily using calculus ideas like improper integrals and (implicitly) 
the integral test for infinite series. But check out the data! 


200 400 600 800 1000 


Figure 20.4.3 Plotting average of o for n = 10,1000 


Of course, one might ask what the slope of this line is! It would have to be 
m = 4 oj) q- Have you seen this constant before? (In a calculus class, you 
should have proved that it does converge.) 
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Figure 20.4.4 Comparing average of o with a line 


Finding a summation of this was the so-called Basel problem’, which Euler 


solved and showed is =. So the slope is m Amazing! (See also Sec- 


6 12° 
tion 24.4.) 


20.5 Looking Ahead 


Let’s recap. 
e The average value of 7(n) was log(n) + 2y — 1. 
¢ The average value of a(n) was (4 07, Gz) 7. 
o Because of Euler’s amazing solution to the Basel problem, we know 


that 
2 


== 
me d 6 
so the constant in question is r. 


We end with the question of yet another average value. What might happen 
with the ¢ function? You can try out various ideas in the following interact. 
Note that a is the coefficient and n is the power of a model az”. 


def L(n): 
ls = [] 
out = Q 


for i in range(1,n+1): 
out += euler_phi(i) 
ls. append((i, out/i)) 
return ls 


LS = L(1000) 

P = Line(LS) 

@interact 

def _(a=.01,n=2,view=(50,[25,50,..500])): 
show(P+plot(a*xx*n,®@,view, color='black!',linestyle="--"), 


8en.wikipedia. org/wiki/Basel_problem 
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xmin=1,xmax=view, ymax=LS[view][1]) 
pretty_print(html(r"Blue_is_the_average_value_of., 

$\phi$")) 
pretty_print (html ("Black _dashed_is_$%s_ 

x*{%S}$"%CLatex(a),n))) 


Hopefully you started finding something interesting. However, we aren’t 
ready to prove anything about this case quite — yet! 


20.6 Exercises 


Exercise Group. We start with some exercises testing understanding of 
Landau notation. 


1. Show that a(n) is O(n”) (compare to the sum of all integers up to n). 


2. Use the formula for the sum of the first n perfect squares (often en- 
countered in a Transition to Proof course or when first doing definite 
integrals in Calculus) and the previous exercise to show that the av- 
erage value of o(n) is Big Oh of n?. (This can be loosey-goosey.) 

3. Show that if g and h are both O(f) for some f, then g +h is also 
O(f). 

4. Show that if g is O(f) for some f, then if b > 0 we have that g is 
O(bf) and bg is O(f). 

5. Show that if g is O(f) for some f and if f(x) < h(x) for x large 
enough, then g is also O(h). 

6. Finda formula for the average value of the u and N functions (up through 


n), where u(n) = 1 for all n and N(n) = n for all n (recall Defini- 
tion 19.2.9). 


7. As suggested at the end of Subsection 20.2.1, if you are familiar with 
semilog and log-log plots and how to use them to find possible formulas, 
look up how to use them in Sage and modify the examples to explore 
whether the average value of 7 could be a power function, exponential, or 
something else. 


8. At the start of Subsection 20.2.1 we plot the cumulative average value of 
T. Note that because 


7(1) + 7(2) +7 (3) + 7(4) 


7(5) =2= ri 


this value is the same for n = 4,5. Is there ever an n > 5 where this 
happens again? 

9. Finish off all calculus details in the argument in Subsection 20.2.3. 

10. Finish the details of the proof that 7 is O(¢/z) 


11. Show that r(n) is not O(1). (Hint: that means there is no constant C’ 
such that 7(n) < C always.) 


12. Suppose that for an arithmetic function f it is known that + 07, f(k) = 
O(1); why is it still possible that f(n) is not O(1)? 

13. Show that r(n) is not O(log(n)), even though it is known that 
+1 7(k) = O(log(n)). (Hint: look at numbers of the form 6*, and 
compare 7 of these to any given multiple of the natural logarithm; use log 
identities or calculus.) 
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14. 
15. 
16. 


17. 


Finish all calculus details of the proof of o’s average size in Section 20.4. 
Finish the details of the first computation of Big Oh in Subsection 20.4.2. 
Find absolute bounds for ¢(n) (simple polynomial or log formulas in terms 
of n). 


Use data, graphs, whatever to conjecture what type of growth the average 
value of @ has up to n. Is it logarithmic, linear, quadratic, exponential, 
something else? Bonus if you find a coefficient for the growth! 


Summary: Long-Term Function Behavior 


Here, we investigate — and prove — what the long-term behavior of several 
important functions is. 


1. 


The first section reviews our computation of the sum-of-squares function 
from the point of view of error, including the important concept of Big 
Oh. 


. In Section 20.2 we begin examining the 7 function from this perspective, 


though without conclusive results. 


. In Section 20.3 we then carefully use geometry and limits to show that 


the average value grows logarithmically, and can even give fairly accurate 
information about the error. 


. Section 20.4 does the same thing, but now for the sum of divisors function. 


. Finally, Section 20.5 gives a short summary and then asks (but does not 


answer) the same questions for ¢. 


The Exercises focus mainly on understanding Landau notation and filling in 
details of the proofs. 


Chapter 21 


The Prime Counting Func- 
tion 


Up to now, our examples of arithmetic functions f(n) have been clearly based 
on some property of the number n itself, such as its divisors, the numbers 
coprime to it, and so forth. 

However, there is one function of prime importance which, as far as we yet 
know, bears no particularly obvious relation to the input — yet in the aggregate 
bears amazing relations to the input! It is the most mysterious one of all. 


Definition 21.0.1 The prime counting function z(z) is defined, for all 
positive numbers x, as the number of primes less than or equal to x, denoted 


(a) = #{p < «| pis prime }. 


21.1 First Steps 


It might seem at first there is very little we can say about this function; after 
all, thus far we’ve seen no particular pattern in the primes themselves (other 
than that they are nearly all odd). You may wish to see what the function 
looks like to confirm this sense. It is a not particularly smoothly increasing 
function with no upper bound (recall Theorem 6.2.1). 


25 — 


—n(x) 


204 


154 


104 


ae 


7 


(0) 20 40 60 80 100 


Figure 21.1.1 Plot of prime pi function (plot (prime_pi, 2, 100)) 
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Sage note 21.1.2 Syntax for counting primes. The syntax for this 
function is prime_pi(n). 


21.1.1 A funky formula 


Given the skepticism of the paragraphs so far this chapter, you may be sur- 
prised to learn there are exact formulas for this function, as well as for the nth 
prime. The following formula (for n > 3) is one of my favorites (see the Ap- 
pendix of the exhaustive Hardy and Wright, [E.2.2], and also Exercise 21.5.1): 


tine 14)" («i al j |) . 


J 


Can you see why this is not useful in practice? So there is plenty left for us to 
discuss. 

On the other hand, it works! We can confirm this by using the following 
code (non-interactive). 


def primeish(n): 


return 0 
ela fin=—2. 
return 1 
elif n==3: 
return 2 
else: 
result = -1 
fact = 1 
for j in range(3,n+1): 
fact = factx(j-2) 
result += (fact - jxfloor(fact/j)) 
return result 


import math 

def plotprimeish(n): 
n = int(math.floor(n)) 
return primeish(n) 


pretty_print (html ("The number _of primes _up_to_20000_this. 
formula gives _is_$%s$"%primeish (20000) )) 
pretty_print(html("The_real.function_in_ Sage gives. 
$%s$"%prime_pi (20000) )) 
pretty_print(html("And_let 's compare _plots:")) 
plot(lambda x:plotprimeish(x), (x,2,100)) + 
plot(prime_pi ,2,100,color='black') 


Sage note 21.1.3 Cython. It’s possible to significantly speed up many such 
computations by converting to Cython?, a way to take Python/Sage and turn 
it into the much-faster compiled language C. For a project, try to speed this 
function up using Cython! 


Sage note 21.1.4 Not all algorithms are equal. Don’t forget that just 
because an algorithm works, doesn’t guarantee it will be useful in practice! 
However, it’s often useful to get something correct first, and only then try to 


Imathworld.wolfram.com/PrimeFormuLas.html 
2www.cython.org 
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optimize. 


21.1.2 A very low bound 


On a more computationally feasible note, one can find a very rudimentary 
(lower) bound on this function. Recall that unadorned logarithms are the 
natural log. 


Fact 21.1.5 There are at least 
| eeeete eet) 


log(2) 


primes less than or equal to x. 


Proof. In Saidak’s proof® [E.7.22] of the infinitude of the primes, he constructs 
the sequence 


| + 1 = [logy(logy(x))| +1 


) Oy "1, 004i 1, C64 DOC 41+)... 


Then he shows, similarly to Euclid’s proof, that there is at least one new prime 
divisor in each element of the sequence (even if not necessarily a larger one). 
So the nth prime can be no bigger than the nth term of this sequence. (See 
Exercise 21.5.3.) 

By induction, we see that this term (and hence the nth prime) is less than or 
equal to 22” ', 


e The case n = 1 is clear, since the first prime is 2. 


e The nth term is the previous terms multiplied together, plus 1, which by 
induction is less than 


92°92! 2 wo Tesi gt 2t4pe +2"? a 92” 1-1 A 927} 
(this uses the same type of technique as in Subsection 4.5.2). 


So when z(x) = n, the nth term in the sequence is 22°" which can’t be 
less than n itself (the nth prime is certainly at least n). If we rewrite this as 


227" > n, we can take two logs to get 


log(log(2?""*)) = log(2™—! log(2)) = ((w)—1) log(2)+log(log(2)) > log(log(n)). 


This yields the given statement*, with the floor function accounting for the 
fact that a takes only integer values. a 

As you can see below, this is not a very useful bound, considering there 
are actually 25 primes less than 100, not three! Each of the inequalities in 
the proof was in a sense ‘wasteful’. Note also that the floor function is only 
necessary for x < 5. 


3t5k.org/notes/proofs/infinite/Saidak.html 
4See also [E.2.1, Corollaries 2.7 and 2.8] for this proof, but connected more directly to 
Euclid’s proof of the infinitude of primes. 


CHAPTER 21. THE PRIME COUNTING FUNCTION 370 


254 
—log of log plus 1 ee be 
—prime pi a 


20+ 


154 rom 
10 4 ue 


2 


Figure 21.1.6 Plot of prime pi versus log log 


21.1.3 Knowledge from nowhere 


Finally, although it may not seem evident, you should know that it is not 
necessary to actually find all the first n primes (even of a particular type) to 
compute how many there are, at least not always. 


Definition 21.1.7 Let ¢(n,a) to be the number of positive integers less than 
n which are not divisible by any of the first a primes. (We can label these 
primes p; through p, for convenience.) © 
Try Exercise 21.5.2 to see how this function works. 
It is possible to develop the recursive formula 


o(n,0) = d(n,a—1) 6([4].< :) 


Pa 


which allows use of a type of inductive argument to compute ¢(n, a) without 
having to use many computational resources. It is then not too hard to use a 
counting argument to prove that 


n(n) = m(Vn) + O(n, m(Vn)) — 1. 


This is the typical way to calculate 7 in software without actually counting 
primes, and with some speedups it can be quite efficient. 

Interestingly, this is also how one finds the nth prime®. You use an approx- 
imation to the nth prime like nlog(n) and then check values of 7(n) near that 
point to see where the value changes, which should lead you exactly to the 
prime you seek. (Recall Sage note 4.2.1 about %time when using the following 
cell.) 


%time nth_prime(10%7) 


>The paper [E.7.36] has a result too delightful not to share, that there is a specific irrational 
number close to three which can generate the nth prime. It is of course just as useful as 
Subsection 21.1.1, since we have to determine the digits of this constant experimentally! 
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21.2 Some History 


Somewhat remarkably, given how long humans have been studying primes, the 
first people we know of compiling substantial data about them are Gauss and 
Legendre, around 1800. 


Legendre first tried to estimate 7(a). He said that m(a) ~ a where 


log 
he fudges the constant A ~ 1.08366. More precisely, he claimed that 7(z) is 


asymptotic to this function. 


Definition 21.2.1 We say that two functions f(x) and g(x) are asymptotic 
to each other when 

van £2) 

im 


aro g(x) 


Essentially, in the long run these functions get as close to each other as you 
like, on a percentage basis. © 

Here is another way to think about this. Think of the average chance 
of a number of size x being prime; Legendre guessed this was of the form 
One This general notion was based on a lot of data he had collected, and 
the constant A he finally settled on seemed to give the best match to the data. 
(See also Figure 21.2.4 below.) 

Not long after this, Gauss came up with a solution that was more elegant 
— and despite not being ‘fitted’ to the data in the same way, was correct. And 
he didn’t tell anyone for over fifty years! Gauss’ conjecture was that 


n(x) 


lim ———~— =1 
3% Z/log(z) 
Or, using our new term, 7(z) is asymptotic to meee 


21.2.1 The first really accurate estimate and errors 


In fact, Gauss makes this estimate even more precise, in a letter to his former 
student Johann Encke. Here is the general idea. 

First, reinterpret the proportion as suggesting that 1/log(a) integers near 
x are prime. If we do that, then we can think of 1/log(#) as a probability 
density function. What do we do with such functions? We integrate the 
function to get the cumulative amount! 

That is, we should expect that 7(x) = i, AOI or equivalently 


Definition 21.2.2 We give the name logarithmic integral® to the (conver- 
gent) integral Li(x) = Jf, oat)’ % 

That a function as rigid as 7 would be close to an integral function should 
sound like it has a 100% probability of being crazy! But Gauss was no fool, 
and the accuracy is astounding. 


6There is also a definition for this integral Ie OE which has a properly defined value 
(beyond the level of this course) despite the integrand going to negative infinity. The form 
used for the prime counting function is traditionally the one with lower bound 2, for reasons 
clear in the rest of this text. There are no divergence issues at stake. 
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Figure 21.2.3 Plot of prime pi function versus log integral 


@interact 
def _(n=100): 
show(plot (prime_pi ,3,n,color='black', 
legend_Label=r'$\pi(x)$!') + 
plot (x/log(x) ,3,n,color='red', 
Legend_Label=r'$x/\log(x)$') + plot(Li,3,n, 
color='green', lLegend_label='$Li(x)$')) 


Notice how much closer Li(x) is to the actual value of a(x) than the 
x/log(x) estimate. It’s usually closer by several orders of magnitude, as you 
can try verifying numerically in the following interact. 


@interact 
def _(n=[100,1000 , 1000000 , 10000000001): 
P = prime_pi(n) 
pretty_print (html (r"$\pi (%s)=%s$"%(n, prime_pi(n)))) 
pretty_print (html (r"The error with _$%s/\log(%s)$ iso 
$\approx.%s$"%(n,n,P-(n/log(n)).n()))) 
pretty_print (html (r"The error with _$Li(%s)$ is $\approx. 
%8$"%(n,(P-Li(n)).n()))) 
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Figure 21.2.4 Excerpt from Gauss’ letter 
to Encke on prime numbers 


21.2.2 Exploring Li 


One of the brilliant aspects of 
the internet is how much eas- 
ier it is to find source material 
of such things. Courtesy of the 
digitization center at the State 
and University Library of G6t- 
tingen’ (the university where 
Gauss worked), you can see a 
scan of the actual letter® in 
question®. 

In Figure 21.2.4, Gauss is com- 
paring his calculations of the 
number of primes with his for- 
mula, as well as those of his 
correspondent and Legendre. 
Whether or not you can read 
Gauss’ (quite legible) German, 
you can still note how in the 
last set of numbers he is essen- 
tially doing data science on Le- 
gendre’s formula, with A as the 
modeling variable, using more 
and more detailed training sets! 


Can we try for some more analysis? Since we saw that x/log(x) didn’t seem 
to be as good an approximation, we’ll leave it out for now. This graphic show 
two representative 1000-wide stretches, and the following interact allows you 


to explore more of them. 


7gdz.sub.uni-goettingen.de 
8gauss. adw-goe.de/handle/gauss/199 


°Thanks to Martin Liebetruth for helpful correspondence; another helpful article on this, 
by Yuri Tschinkel, is in the Bulletin of the AMS (search www.ams.org for ‘gauss tschinkel’). 
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Figure 21.2.5 Compare prime pi and Li over two ranges 


@interact 
def _(n=1000): 
P = plot(prime_pi ,3,n, 
color='black!', legend_Label=r'$\pi(x)$') 
P += plot(Li,3,n, color='green',legend_label='$Li(x)$') 
show(P, xmin=max(n-1000,0@), ymin=prime_pi (max (n-1000,0))) 


Based on this evidence, it seems clear that Li(x), even if it’s a good ap- 
proximation, should not ever be less than the actual count of primes. And yet, 
the English mathematician John Littlewood proved the following result. 


Fact 21.2.6 For any number x, there is an x’ > x such that 


Li(a!) < w(2’). 


Historical remark 21.2.7 Skewes’ Number. As remarkable as this seems, 
Littlewood’s student Stanley Skewes proved the following even more amazing 
fact: 


The first time this happens is no higher than 


1000 
10 
10 
0 


10! 


In Skewes’ original paper, this bound had a 34 instead of 1000 in the last 
exponent!, but that result relied upon a special assumption (the so-called 
Riemann Hypothesis, see Chapter 25). Both of these bounds are known as 
Skewes’ number". 

We have known since the 1960’s!* that there is an actual run of integers 
where Li is smaller starting near 1.53 x 10'1®°. As of this writing we know that 
the first time this “switch” happens is no higher than 1.4 x 10316 (see [E.7.23] 
and a follow-up from 2015 for the state of the art, as well as the book |E.4.27]). 
Of course, we haven’t even gotten remotely near those bounds with computers, 


79 
10 Actually, the exact bound was e® 


Noeis.org/wiki/Skewes_number 
12voutu. be/Lihh_LMmcDw?t=498 
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although computation is necessary to help obtain these bounds. 

This uncertainty sounds terrible, but actually is good news. After all, if 7 
beats Li once in a while, then Li must be a great approximation indeed! So, 
just how great is it? 


21.3 The Prime Number Theorem 


It turns out Li(x) is a pretty good approximation indeed. 


21.3.1 Stating the theorem 
Theorem 21.3.1 Prime Number Theorem. [/f a(x) is the number of 
primes p < «x, then 
ma) _ 
LOO Li(a) ~ 
In fact, the first bound also has this property (see Exercise 21.5.6): 


lim Ae) = 
rh Z/log(a) 


Historical remark 21.3.2 The Prime Number Theorem. The Prime 
Number Theorem was conjectured by Bernhard Riemann in his only paper on 
number theory. It was proved about 100 years after the initial investigations 
of Gauss by the French and Belgian mathematicians Jacques Hadamard and 
Charles-Jean de la Vallée-Poussin. They made good use of the analytic methods 
we are slowly approaching. 

Any proof is this is well beyond the bounds of this text. One of several 
modern versions is in the analytic number theory text [E.4.6] by Apostol; see 
also [E.2.9]. Additionally, as a series of exercises (!) in that book, one can also 
explore a proof!? due to Selberg and Erdés that is “elementary”, in the sense 
of not using complex-valued integrals. There is a well-known exposition of a 
very similar proof in [E.2.2], and another in [E.4.4]. 

Later, we’ll see that many better approximations to 7(a) exist which come 
out of this sort of thinking. Notice how the approximations in the next inter- 
active cell take the logarithmic integral and subtract various correction factors 
in the attempt to get closer. 


@interact 
def _(n=100): 

P = plot(prime_pi ,3,n, 
color='black!', lLegend_Label=r'$\pi(x)$') 

P += plot(Li,3,n, color='green',legend_label='$Li(x)$') 

P += plot(lambda x: Li(x) - sqrt(prime_pi(x)),3,n, 
color='orange!', legend_label=r'$Li(x)-\sqrt{\pi(x)}$!') 

P += plot(lambda x: Li(x) - .5*Li(sqrt(x)),3,n, 

Colon —sime dias 
legend_label=r'$Li(x)-\frac{1}{2}Li(\sqrt{x})$') 

P += plot(lambda x: Li(x) - sqrt(x)/log(x) ,3,n, 
color='purple', 
legend_lLabel=r'$Li(x)-\sqrt{x}/\log(x)$') 

show(P, xmin=max(n-1000,0@), ymin=prime_pi(max(n-1000,@))) 


13There is an interesting controversy behind this proof which is worth looking up. Selberg 
was an early Fields medalist, and Erdés was one of the most prolific mathematicians of all 
time. 
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21.3.2 Chebyshev's contributions 


Although we cannot explore the theorem itself in depth, we can try to understand 
some of the intermediate steps. This is a good place to highlight the contributions 
of the great Russian mathematician Chebyshev. 


Historical remark 21.3.3 Pafnuty Chebyshev. Chebyshev'* (Ue6pmmés) 
was a prominent Russian mathematician of the mid-19th century, but his 
most important legacy may be bringing the native Russian tradition into 
international prominence. (Recall that Euler worked in Russia for much of 
his life, but alongside other Swiss scientists.) In addition to fundamental 
advances in this type of number theory, he worked on the theory of orthogonal 
polynomials which is used so much today in applications, and probability theory 
underlying modern statistics. 

He was the first person to prove a conjecture known (even today!) as 
Bertrand's Postulate, after the French mathematician who first proposed it. 


Theorem 21.3.4 Bertrand's Postulate. For any integer n > 2, there is a 
prime between n and 2n. 

Proof. It is actually quite possible to prove this at the level we have reached, 
but any proof is long enough? to take us a little far afield. a 


Try testing it yourself below! 


@interact 
def _(n=25): 
pretty_print(html("$%s$_is.a_prime_between_$%s$ and. 
$%s$"%(next_prime(n),n,2*n))) 


On a related note, although this proves you can't have too long of stretches 
without prime numbers, you can certainly have arbitrary stretches of composite 
numbers. See Exercise 21.5.7 for an easy example. Paul Nahin, in [E.7.13], 
describes the following more clever example, a cute result of Louis A. Graham. 


Fact 21.3.5 Multiply all the primes p from 2 ton+1 to get N = |i Pere 2 

Then we have n consecutive composite integers from N —(n+1) to N —2. 

Proof. We know that N is a multiple of a prime factor!® of each number zx from 

2 to n+ 1. For each such x and prime factor p,, Proposition 1.2.8 guarantees 

that N — a is also a multiple of pz. a 
Try testing it yourself below! 


@interact 
def _(n=5): 
N = prod(prime_range(n+t2) ) 
pretty_print(html("The_ numbers _between_$%s$_and_$%s$_are_ 
all_composite"%(N-(n+1) ,N-2))) 
L = [N-(n+1)..N-2] 
print (LN-(n+1)..N-2]) 
pretty_print(html("have_factors")) 
print([l.divisors()[1] for l in L]) 
pretty_print(html("and_there_are_$%s$_of_them"%(len(L)))) 


14www-history.mcs.st-and.ac.uk/Biographies/Chebyshev.html 
45en.wikipedia. org/wiki/Proof_of_Bertrand's_postulate 
16Tn fact, all such factors. 
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More immediately germane to our task of looking at a(x) and its value, 
Chebyshev proved the first substantial result on the way to the Prime Number 
Theorem, validating Legendre's intuition. 


Theorem 21.3.6 Big Oh of Prime Pi. It is true both that: 


log(x) 


oe ee) 1s O(n(a)). 
Interestingly, this is not the same as the Prime Number Theorem; see 


Exercise 21.5.8. 
What we will show here is the gist of a smaller piece of this theorem. 


e (x) is O (max) and 


Proposition 21.3.7 For all positive x, m(x) < 2 aog(z) 


Proof. We follow Stopple's presentation in Section 5.2 of [E.4.5] closely in 
sketching out most of a proof of this below; see also [E.2.11] for a very similar 
proof. It is a little longer than some of our other proofs. It uses some very 
basic combinatorial ideas and calculus facts, however, so it is a great example 
of several parts of mathematics coming together. 

First, it's not hard to verify this for « < 1000, as the following figure 
demonstrates. 


200 400 600 800 1000 


Figure 21.3.8 Plot of prime pi function versus 22/log(z) 


Now we'll proceed by induction, in an unusual way. We'll assume it is true for 
n, and prove it is true for 2n. This needs a little massaging for odd numbers, 
but is a legitimate induction method. 

With this in mind, we first assume that a(n) < 2am): Now what? 

Below, in Lemma 21.3.9 we look at the product of all the primes (if any) between 
n and 2n, which we write as 


In that result some combinatorial thinking leads to the following estimate: 


nt 2n)—a(n) “P< (2n)! 


2n 
nin! <2 
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These bounds show that P is between a certain power of n and a certain power 
of 2. 

Now we will manipulate this to get the final result. Begin by taking log of both 
ends to get 


(1(2n) — 1(n)) log(n) < 2nlog(2) 
Now divide out and isolate to get 


2n log(2) an n  __ log(2) +1 
Tog(ny 7") <8) cata * togtn) ~  log(e) 


In Exercise 21.5.10 you will show that, as long as n > 1000, we have the 
inequality 


m(2n) < 2n. 


log(2) +1 2 2 _ 2 
log(n) log(2) + log(n) — log(2n) 
Now we can put it all together to see that 


log(2) + 1 ee 2n 
log(n) log(2n)’ 


m(2n) < 


which is exactly what the proposition would predict. 
To rescue this for 2n + 1, we need another calculus comparison. First, from 
above we have 


2 1 
log(2) + 9 


m(2n+1)<2(2n)+1< Toa) 


n+1. 


Since 2 it will suffice then to show 


een ET) > CRORES 


4n 


(2 + 2log(2)) eee re eer, 


Since n > 1000 and is increasing, aTToata) < 0.007, so 


Toatn) 


+1 < (2+ 2log(2) + 0.007) —— < 3.394 


(2 + 2log(2)) EC as fost’ 


n 
log(n) 
To finish it suffices to show that in this range 


n 2 4n 
log(n) ~ log(2n +1)’ 


3.394 


Showing the last (purely calculus) steps is Exercise 21.5.11. a 
Lemma 21.3.9 Let the product of all the primes between n and 2n be written 


P= [J » 


n<p<2n 


Then we can bound it as 


2 
Cn) ny < g2n 


Proof. Think of all the primes in een. os the one hand, each of these 
primes p is greater than n, and there are 7(2n) — 7(n) of them. So 


nt@n)-m(n) < p< 


nt 2n)—2(n) < P. 


On the other hand, each of these primes is greater than n but they are all in 
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the list of numbers from n to 2n, so their product divides 


(2n) - (2n — 1) - (2n — 2)---(n +1) 
n:-(n—1)-(n—2)---1 


That is to say P is a factor of a binomial coefficient 


P| (2n) - (2n — 1) - (Qn — 2)---(n +1) = (2n)! 
n-(n—1)-(n—2)---1 nn! 


and in particular, 


We are now ready for the conceptual key of the proof, which uses the 
combinatorial leitmotif of counting things in two different way. Namely, we 
reinterpret this factorial fraction as the number of ways to choose n things 
from a collection of 2n things! And the number of ways to choose n things 
is certainly less than the number of ways to pick any old collection out of 2n 
things, which is 2?” (because you either pick it or you don't). 

Since we showed both bounds, this concludes the proof. | 


21.4 A Slice of the Prime Number Theorem 


We end this chapter with a substantial piece of a real proof in the direction 
of the Prime Number Theorem, courtesy of a function also first introduced 
by Chebyshev. The argument is dense, but requires nothing beyond calculus 
and a willingness to allow a lot of algebraic and integral manipulation for the 
purposes of estimation. (See a good calculus text?” to review integral concepts. ) 


21.4.1 Functions to know 


First, we'll review the main function. Think of the prime counting function 7 
as a so-called step function, where every time you hit a new prime you add 
1. The picture reminds you of this attribute. 


254 ir 


D 


20 40 60 80 100 


Figure 21.4.1 Plot of prime pi function 


1Vactivecalculus.org/single/C-4.html 
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Let’s define a new function in that spirit. Instead of adding 1 each time x 
hits a prime, we will add log(p) (recall that this is the natural logarithm) each 
time we hit a prime p. Of course, this value we add will get bigger as p gets 
bigger. 


80 i 


70 4 


60 4 ir 


50 


405 
30 4 


20 5 


20 40 60 80 100 


Figure 21.4.2 Plot of Chebyshev theta function 
Definition 21.4.3 We call the function given by this formula Chebyshev’s 


theta function: 
Q(x) = 5 > log(p). 


psu 


% 


Sage note 21.4.4 Python can do math too. We include an interactive 
version so you can see the code. 


def theta(x): return sum(math.log(p) for p in 
prime_range(1, floor(x)+1)) 

@interact 

def _(n=100): 
show(plot(theta,1,n)) 


The syntax math. log is referring to Python’s builtin calculation of the nat- 
ural logarithm, accessible in the math module!’. This is sometimes faster and 
easier to use than Sage’s more powerful capabilities, because if you put an 
integer in Sage’s logarithm, it will normally not approximate it. All we want 
here is an easy approximation, so this should be faster. 

Earlier in this chapter we noted that the Prime Number Theorem is logically 
equivalent to the limit lim,_.., en = 1. There are actually many such 
logical equivalences. One of them involves O: 


lim e@) 


2-00 xv 


= 


This is certainly numerically plausible. Here is a plot of both limits, along with 
the constant function 1. 


18docs.python.org/2.7/Library/math. html 
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i 


0.8 
0.6 
0.4 
0.2 
— Chebyshev Theta limit 
—Prime Number Theorem limit 
2e4 4e4 6e4 — 8e4 1e5 


Figure 21.4.5 Plot of Chebyshev theta function limits 


In the interact below, there is an option for an Li versus x/log(x) version 
of the theorem. Note how much better the prime number theorem limit looks 
with the Li version. 


def theta(x): return sum(math.log(p) for p in 
prime_range(1, floor(x)+1)) 
def pnt(n): return prime_pi(n)*log(n)/n 
def pntli(n): return prime_pi(n)/Li(n) 
def thox(n): return theta(n)/n 
@interact 
def _(Cend=100000,PNT=['log', 'Li']): 
P = plot(1,(1,end),color='black') 
P += plot(thox,(1,end),legend_lLabel='Chebyshev._Theta') 
if PNT == "log": 
P += 
plot(pnt,(1,end),color='red!', Llegend_label='Prime. 
Number._Theorem ') 
ay (PN Ses hae 
P += 
plot(pntli,(1,end),color='red', lLegend_label='Prime], 
NumberTheorem ') 
show(P) 


As usual, proving such things completely is beyond the level of this course, 
but we can prove the following partial implication. 


Proposition 21.4.6 If the Prime Number Theorem is true, then it is also true 
that O(x)/x approaches 1. 
Proof. The rest of this section is the proof. a 


21.4.2 Getting a formula with sleights of hand 


In order to prove this implication, we will first need a formula telling us more 
about O(x). Our strategy’? will be to first turn O(x) into an even more 
hopelessly complicated sum, but then use calculus trickily to get something 
usable by summing up integrals. 

In order to do this, we need two subsidiary functions. First, recall the 
notation |a| for the greatest integer less than x. Secondly: 


19This is an expansion of the terse approach taken in [E.4.6, Theorems 4.3 and 4.4]. 
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Definition 21.4.7 We let a(n) be the prime number indicator function defined 
by 


1 if nis prime 
a(n) = : 
0 otherwise 


5 10 15 20 


Figure 21.4.8 Prime 7 versus a indicator function 


One can get the indicator function just by writing prime_pi(x)-prime_pi(x-1). 
For convenience we write m = |a|. Then we can rewrite these step func- 
tions as weighted sums of a(n): 


n(x) = S~ a(n) and O(a = ) log(n 


Our goal is to rearrange O to be a sum of terms involving a. First we turn 
© into a difference of sums by rearranging (and using log(1) = 0): 


So [n(n) — a(n — 1) log(n) = $5 x(n) log(n) — S$ > m(n) log(n + 1) 

n=2 n=? n=1 
This difference of sums can be combined into a single sum, with just two left 
over terms”. 


m—1 


O(x) = > m(n)[log(n) — log(n + 1)] + 7(m) log(m) — 7(1) log(2). 


n=2 


To continue, we will rewrite almost all of this as single integral. We use a 
few key facts: 


e The difference which appears in the the sum in the immediately preceding 


© formula can be considered as an integral, — (log(n + 1) — log(n)) = 
n+l dt 
afr 


n t° 


20Students a little more familiar with calculus may want to compare this process to inte- 
gration by parts, but in a discrete context, sometimes called Abel summation. 
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e We have that 7(x) is constant on the interval [|x|,2], and in particular 
on any given interval [n,n +1), so it may be factored out of any integral 
from n ton-+1. 


e We can rearrange and add sums and integrals as usual. 
e Note that 7(1) = 0, so the second extra term is zero. 


This yields the following rewrite. 


O(a) = — y [F(n) [- “ + 1(m) log(m) 
=— [F(n) [~ 4) + 1(m) log(m) — 2(x) log(a) + m(a) log(x) 
oe | . nat Se teNieetel = 7 mat ee | : aE 


Now we have a formula for O which will allow us to prove something. 


21.4.3 Finish the proof 


We can divide the formula O(x) = r(x) log(x) — J m(e)dt by x 


Q(x) _ r(x)log(a) fy “at 
x x x : 
Given that the Prime Number Theorem says that lim,_,.. of the fraction with 


a(x) in it is 1, proving that limz.. te) is also 1 is equivalent to proving 


lim =| Te 
2. t 


@—oo £ 


Now, the Prime Number Theorem also implies that w(t) and jogtt) are 


asymptotic (recall Definition 21.2.1), so that their averaging integrals 


Lf 2 semat ff 
zJo t x Jo log(t) 


clearly are also asymptotic. 

This reduces our proof to showing that the average value of 1/log(t) tends 
to zero. Since integrals have a graphical interpretation, we now use the follow- 
ing graph of the integral limit to finish the proof! 

Consider that one possible upper sum for the integral of 1/log(t) between 2 


and 9 is the area of the two rectangles shown below, one with area OI (/9-2) 


and the other with area ee Ty (9 — V9). (Of course /9 = 3 but this form is 


more useful here.) 
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0.5 4 


2 4 6 8 10 


Figure 21.4.9 Estimating integrals for proof 


In general, the same argument should hold, so a possible overestimate of 


Jp dt/log(t) is 
1 
ee eee 


and we want the limit as 7 — oo of + times that quantity. 
Now is the time to recklessly use logarithmic identities: 


= (ta (E-2) + pate ve) 


x \ log(2) log(/z) 
1 oF ol 1 
~ log(2)a/?— ar log(2)  log(/x) ~— log( Vx) x1/? 
1 2 2 2 


~ log(2)x1/2— xlog(2) ° log(x) — log(x)x1/2 

This last expression has positive powers of x and their logs in the denominators, 
so it pretty clearly goes to zero as 7 — oo. 

If the algebra doesn’t convince you, perhaps an interactive graph will. Be- 

low, black is the overestimate to the integral and red is 1/2 times the integral. 


@interact 
def _(top=(16,[n*2 for n in [2..10]])): 

f(x) =1/ log (x) 

P=plot(f,1,topt1) 

Petr ine CRC2 Oi C2> f1@2) a (mathe siqinti@Eop» 1h C2)D 
(math.sqrt(top) ,®@)], rgbcolor='black') 

P += Line([L(math.sqrt(top),f(math.sqrt(top))), 

(top, f(math.sqrt(top))),(top,0)], rgbcolor='black') 

P += 
Line([(2,0) ,(2,f(2)) ,(2+(math.sqrt(top)-2)/top,f(2)), 
(2+(math.sqrt(top)-2)/top,®)], rgbcolor='red') 

P += Line([(math.sqrt(top),f(math.sqrt(top))), 
(math.sqrt(top)+(top-math.sqrt(top))/top, 
f(math.sqrt(top))), (math.sqrt(top) + 
(top-math.sqrt(top))/top,0)], rgbcolor='red') 

P. show (ymax =2) 


The picture confirms our analytic proof that the limit of is) is the same 


as that of ao which is what we desired! 
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21. 


10. 


11. 


5 Exercises 


Consider Wilson’s Theorem and consider what will happen to (j — 2)! 
modulo primes and composites (this is Exercise 7.7.8). Use this to prove 
the bizarre formula in Section 21.1. 

Calculate $(n, a) (recall Definition 21.1.7) for various composite n between 
10 and 100 for a = 2,3,4 and compare to ¢(n). 


Without looking at any links, reconstruct the proof of the infinitude of 
primes mentioned in the first paragraph of the proof of Fact 21.1.5. 
Come up with two functions f(x) and g(x) that both go to infinity as 
x — co, such that f(x) is always ahead of g(x), but f and g are asymptotic 
(to each other). 

Come up with two functions f(z) and g() that both go to infinity as 7 > 
oo, but that switch the lead infinitely often and f and g are asymptotic. 
Show that the two limits in the Prime Number Theorem are really equiv- 
alent. That is, show that if lim,_,.. t()/Li(#) = 1, then the other limit 
with 2/log(x) is also 1, and vice versa. 

Find an arbitrarily long sequence of consecutive composite numbers using 
factorials. 

Come up with two functions f(x) and g(x) such that f(x) is O(g(x)) and 
g(x) is O(f(a)), but they are not asymptotic. 

Use Proposition 21.3.7 to show that limz_,.. m(a)/a = 0. 


Show that ifm > 1000 then 


log(2) + 1 Z 2 _ 2 
log(n) log(2) +log(n) — log(2n) 


To do this, you should compare 2 log(n) and (log(2) + 1)(log(2) + log(n)) 
and their derivatives for n = 1000 and up, then divide the two expressions 
appropriately. You will need to justify the calculus fact that if f(a) > 
g(ao) and f’ > g' for x > xo, then f > g for x > zo as well. See any 
calculus textbook?! for review of how derivatives work. 


Verify that 3.394557 < joganty for n > 1000. (See the previous prob- 


lems; you will need to verify that the derivative of 


log(n) . ays z 
Tog(2n 41) 1S positive m 
n 
log(n) 


that range.) Also confirm that is increasing . 


Summary: The Prime Counting Function 


Here, we harness the power of the Legendre symbol to find a deep correlation 
between solutions of two seemingly unrelated congruences — a correlation that 
enables us to tell very quickly whether any quadratic congruence has a solution! 


1. 
2. 


Section 21.1 introduces the prime counting function 7(2). 


Section 21.2 gives some history and cool graphics to help suggest there 
is some regularity in this behavior. 


. In the next section we state the Theorem 21.3.1, and show that 1(z) is 


O(a/log(x)) in Proposition 21.3.7. 


. Then in Section 21.4 we see a small piece of the methods one might use 


in proving the whole theorem. 


2lactivecalculus.org/single/C-1.html 
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The Exercises help fill in details of the proofs and give experience thinking 
about asymptotic behavior. 


Chapter 22 


More on Prime Numbers 


This chapter serves two purposes. First, there are all kinds of interesting facts 
ahout prime numbers, and this chapter collates some of the ones I personally 
find amazing. What are your favorites? 

Secondly, exploring the wonderful world of primes will start us heading back 
toward other arithmetic functions, especially toward developing the language 
we’ll need to explore 7(x) more rigorously. 

There are lots of resources beyond this for exploring primes! One inter- 
esting resource is Numberphile’s YouTube channel for prime videos!. Paulo 
Ribenboim has several well-known books about them, such as The Little Book 
of Bigger Primes [E.4.17]. 

But for usability, I have to mention one of the best web sites about primes. 
This is the Prime Pages”, a labor of love by Chris Caldwell (emeritus at Uni- 
versity of Tennessee Martin). It’s just amazingly full of useful information, 
but also quite user-friendly and usable for people with a large variety of back- 
grounds. In particular, the ‘Top Twenty’s Complete Index?’ page has links to 
the top twenty of just about every prime type you can imagine, a cornucopia 
of information. My personal favorite is the prediction of when the first billion 
digit prime will surface’. 


22.1 Prime Races 


One of Chebyshev’s more interesting observations was that our familiar cate- 
gories of primes — the classes 4k + 1 and 4k + 3 — don’t always seem to have 
the ‘same size’. Before moving on, try solving the next question by hand. 


Question 22.1.1 How many primes of each type there are up to n = 10, 
n = 20, and n = 50? Try making a table. 


You can try it by hand, or we can, as always, use computational power 
below to try to see more. (We saw this computation in a different context way 
back in Section 7.6.) 


import itertools 


@interact 
def _(n=7): 


lbit.Ly/primevids 

2t5k.org 
3t5k.org/top20/index.php 
4t5k.org/notes/by_year.html#3 
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L = itertools.zip_longest([p for p in prime_range(n+1) 
if p%4==1],[p for p in prime_range(nt+1) if p%4==3]) 

L = [[E'',l£1]] if l1£0] is None else l for U in L] 

T = CEr'$p\equivi1\text{.(mod_}4)$',r'$p\equivi3\text{._ 


(mod. }4)$']] 
pretty_print (html (table(T+L, header_row=True, 
frame=True))) 


@interact 


def _(k=100): 
pl = 0 
p3 = 0 
for i in prime_range(k): 
if i%4==1: 
pl += 1 
if i%4==3: 
p3 += 1 


pretty_print (html ("Up_to_$k=%s$, there_are"%k) ) 

pretty_print(html(r"%s primes _$p\equiv_1\text{_(mod. }4) $_ 
and_."%p1)) 

pretty_print(html(r"%s_ primes _$p\equiv.3\text{_ (mod. 
}4)$."%p3)) 


Question 22.1.2 Do you detect the bias Chebyshev did? Do you think it will 
persist? 


22.1.1 Infinitude of types of primes 


Of course, for this question to make sense, we need to make sure this ‘prime 
race’ won’t suddenly run out of gas. We know there are infinitely many primes, 
but what about each type of prime? 


Fact 22.1.3 There are infinitely many primes congruent to 8 modulo 4 and 
there are infinitely many primes congruent to 1 modulo 4. 
Proof. See the following two Propositions 22.1.4 and 22.1.5. | 
It turns out that proving the first part of the proposition is nearly as easy 
as proving the Infinitude of Primes. But the second part seems to requires 
something equivalent to the idea of a square root of —1 existing modulo some 
primes but not modulo others (recall Fact 16.1.2). These proofs are standard; 
we follow the notation in [E.2.1]. 


Proposition 22.1.4 Infinitude of primes 3 (mod 4). There is no largest 
prime congruent to 3 modulo 4. 
Proof. We’ll prove this by contradiction. Suppose p 1, p2,..., px is the (finite) 
set of all primes congruent to 3 modulo 4. 
Form the product of all these primes, together with four; then subtract one to 
let 

m = 4pip2++-pr—1. 


What are the prime divisors of this number? 


e Clearly none of the p; can be a prime divisor, since m is congruent to —1 
modulo all the p;. 


e Since m is not even, it is also not divisible by a power of 2. 
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e If m were a product only of primes congruent to 1 modulo 4, then it 
would have to be 1 modulo 4 itself (since any product of 1s is 1). 


e That is false, so there must be a prime congruent to 3 modulo 4 which 
divides it, which cannot be on the original list of ;. 


This contradicts our assumption of having the full set of such primes, so that 
assumption must have been wrong. a 


Proposition 22.1.5 Infinitude of primes 1 mod 4. There is no largest 
prime congruent to 1 modulo 4. 

Proof. As usual, suppose there are finitely many primes p; which are congruent 
to 1 modulo 4. Let’s form the modified product 


m = (2pip2... pe)? +1. 


What are its prime divisors? 

For the same reasons as in the proof of Proposition 22.1.4, it is clear that m 
is odd and that it is also not divisible by any of the p;. Initially, one might 
assume one could also modify that argument to show that at least one of the 
primes p which divides m is not 3 modulo 4. 

Unfortunately, as 3? is congruent to 1 modulo 4, this argument fails. However, 
we can use an indirect argument. 

For any prime divisor p of m and for x = 2pip2... pp, we have m = 27 +1= 
0 (mod p). So —1 is a quadratic residue modulo p, by definition of quadratic 
residues! Because of Fact 13.3.2, this can only happen if p = 1 (mod 4). 
(Compare with Theorem 13.5.5, where even powers of primes of the form 4k+3 
were allowed; here, they are completely prohibited.) 

Since this p wouldn’t be one of the p;, its existence contradicts the assumption 
that we already had all such primes. a 


22.1.2 Back to bias 


Now that we know we will always have primes of both kinds, let’s return to 
the prime race. From what we’ve seen previously, it looks like the 4k + 3-type 
primes will always stay ahead. But that’s not quite right. The next Sage cell 
computes one place where they fall behind. 


def prime_race_up_to_n(n): 


pl = 0 
p3 = 0 
for i in prime_range(n): 
if i%4==1: 
pl += 1 
if i%4==3: 
p3 += 1 


pretty_print (html (r"Up_toi$n=%s$, .there_are:<ul><li>%s_ 
primes_$p\equiv.1\text{._ (mod. }4)$</li><li>%s primes. 
$p\equivi3\text{..(mod.}4)$.</lLi></ul>"%(n,pl1,p3))) 


@interact 
def _(n=[26860 , 26862 , 26864 , 26880]): 
prime_race_up_to_n(n) 


There are other n for which we have such an ‘inversion’ as well, and it can 
be fun to look for them. The next such time is over six hundred thousand, for 
a little while; after that, you have to look at n over twelve million. Indeed, 
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there is a theorem that there are infinitely many times where this will happen, 
and that the ‘wrong’ team will get ahead by at least a specified amount. 


Fact 22.1.6 No matter how far out you go, there exists ann where the 4k+1 
team is ahead at n by 
1 Jn 


Sieh log(log(log(n))). 


‘You may not be surprised to learn that this result is also due to Littlewood, 
the early contributor in studying the race between a and Li back in Fact 21.2.6. 
That his result is highly nontrivial is seen in the following graphic, which plots 
the difference between the ‘teams’ up to the first place the 4k+1 type is ahead. 


5000 10000 15000 20000 25000 


-104 
-154 
-204 


-254 


-304 


Figure 22.1.7 Difference in prime teams up through n = 26862 


Try the interactive version to see what happens beyond that. 


@interact 
def _(n=26862): 


L = ( 
pl = 0 
p3 = 0 
for i in prime_range(n): 
if i%4==1: 
pl += 1 
L.append(Li,p1-p3]) 
if i%4==3: 
p3 += 1 


L.append(Li,p1-p3]) 
P = plot(1/2*sqrt(x)/log(x)*log(log(log(x))), 
(x,10,n+10)) 
P += plot_step_function(L) 
show(P) 


Even though we can see the difference surge to become positive a few times, 
it seems hopeless for the 4k + 1 team to ever get ahead by as much as the 
extremely slowly growing log(log(log(x))). But it does. 
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22.1.3 Other prime races 


There are many races we can check out, and mathematicians have. (Indeed, 
this section is indebted to the excellent expository article [E.7.3], which has a 
host of recent references.) What is the pattern here, for modulus eight? 


7 
—1 (mod 8) 
—3 (mod 8) 
—5 (mod 8) 
6 ~/7 (mod 8) 


20 40 60 80 100 


Figure 22.1.8 Difference in prime teams modulo eight 


It might be a little tough to see, so feel free to use the interactive version 
below if you are online. 


@interact 


def _(n=100): 
p1,p3,p5,p7=0,0,0,2 
L1 = ([] 
Le = [9 
ES = |e) 
Ly = |p 
for i in prime_range(n): 
if i%8==1: 
pl += 1 
L1.append(Li,p1]) 
elif i%8==3;: 
p3 += 1 
L3.append(Li,p3]) 
elif i%8==5: 
p5 += 1 


L5.append(Li,p5]) 
elif i%8==7;: 
p7 += 1 
L7.append(Li,p7]) 
L1.append([n,p1]) 
L3.append([n, p3]) 
L5.append(Ln,p5]) 
L7.append([n, p7]) 
P = Graphics() 
P += plot_step_function(L1,color='red',legend_label='1_ 
(mod_8) ') 
P += plot_step_function(L3,color='green', Legend_label='3_ 
(mod.,8) ') 
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P += plot_step_function(L5,color='blue',legend_label='5., 
(mod.,8) ') 

P += 
plot_step_function(L7,color='orange', lLegend_label='7._ 
(mod_8) ') 

show(P, xmin=max (@,n-1000) , ymin=max(0,L1[-1][1]-100)) 


It turns out there are several types of theorems/conjectures one can make 
about such races. The key observation (which we will not explain here) is that 
the ‘slow’ teams are the residue classes [a] such that nk + a can be a perfect 
square (see Exercise 22.4.2). In the two examples we showed graphically, only 
4k + 1 and 8k + 1, respectively, are possible perfect (odd) squares, and they 
are the ‘slow’ teams. See also Exercise 22.4.3. 

Nonetheless, for any a,b coprime to each other and to n, 

Number of p = a (mod n) less than x 


li a 
«+00 Number of p = b (mod n) less than « 


so the teams can’t get too far away from each other, at least not on a per- 


centage basis. The more specific result that the numerator and denominator 


are both asymptotic to an is often called the prime number theorem for 


arithmetic progressions, and it was also proved by Vallée-Poussin. (See also 
Subsection 22.2.1.) 

With such a close connection to Chapter 21, at this point you won’t be 
surprised to learn that, even though some teams are usually ahead, that just 
like with a7 and Li, each team does get ahead in the race infinitely often. But if 
you “count right” (and assume some other technical but important hypotheses), 
the proportion of the time the ‘wrong’ teams are ahead in the race is very small. 
(See the article [E.7.3] for more details.) 


22.2 Sequences and Primes 


22.2.1 Primes in sequences 


There is an interesting question implicit in the prime races. To legitimize 
doing the first prime race, we proved that there are infinitely many primes of 
the forms 4k + 1 and 4k + 3. However, we then proceeded to do prime races 
for several other such forms. Is it legitimate to do so? 

The answer is yes, as proved in this major theorem of 1837 that introduced 
limiting and calculus methods to the study of number theory. 


Theorem 22.2.1 Dirichlet’s Theorem on Primes in an Arithmetic 
Progression. If gcd(a,b) = 1, then there are infinitely many primes of the 
form ax +b for x an integer. 
Proof. The proof of this theorem is far beyond the level of this text, but [E.4.6] 
is a standard resource for this. a 
That is, ax +b defines a progression of numbers separated always by a, and 
this theorem says there are infinitely many primes in any such progression that 
makes sense in terms of relative primeness. It is a weak version of a prime race; 
it just says that it makes sense to do them, though (as we saw) there is much 
more information one can glean from them. 


@interact 
def _(a=8,b=7,n=100): 
if gcd(a,b)!=1: 
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pretty_print (html ("Oops!_ The _progression_won't _have., 
many. primes if") ) 
pretty_print(html("$a$_and_$b$_share.a_common,, 
factor!")) 
else: 
pretty_print(html( "Primes of ithe. form_$%sx+%s$ upto. 
$%S$:"%Ca,b,n))) 
for x in prime_range(n): 
if x%a==b: 
print (x) 


We have already proved this for a = 4. It is easy to prove for a = 2! (See 
Exercise 22.4.4.) 

It is also possible to prove the theorem for b = 1, or b = —1, without devel- 
oping much bigger tools. In the article [E.7.1] a lot of factoring and expanding 
is used, and a much more recent article by Xianzu Lin |E.7.7] is similarly el- 
ementary; see also [E.2.16, Theorem 5.3.4]. One can even prove Dirichlet’s 
theorem without Dirichlet’s methods for any b such that b? = 1(mod a), but 
doing so involves some high-level details about polynomial factorization (see 
Murty and Thain’s paper for details®). 


Historical remark 22.2.2 Lejeune Dirichlet. Johann Peter Gustav Leje- 
une Dirichlet, as his name suggests, was from a world where ethnicity and state 
borders were not necessarily the same. He was born into a part of Germany 
occupied by Napoleon, whose defeat sent it back to Prussia; as a young man, 
he studied and worked in Paris, but spent most of his professional career in 
Prussia (including Berlin and Gottingen). 

In addition to the theorem in this section, Dirichlet made major contribu- 
tions to the solution of Fermat’s Last Theorem and introduced Dirichlet Series. 
He also worked in fluid dynamics and trigonometric series; it was in the lat- 
ter research that he introduced functions that are nowhere continuous, which 
eventually were determined to not be integrable under the definitions then 
available. Naturally, this paper was written in French, in a German journal. 


22.2.2 Sequences in primes 


We can also look at the opposite question. Instead of considering whether 
primes exist in a given arithmetic progression, are there arithmetic progressions 
made of solely of primes? 


Question 22.2.3 Can you get a (finite) sequence of the form 


ak +b,k =0,1,2,3,...n 


where all entries are prime? 


It’s easy to find short arithmetic progressions in the primes. We say such 
a progression has length n + 1 in the above notation. 


e 3,5, 7 is an arithmetic progression of length 3, where a = 2. 
e 41, 47, 53, and 59 is an arithmetic progression of length 4, where a = 6. 


Longer ones get harder to find. Can you find a progression of length 5? 
(This is Exercise 22.4.5; there is a small one where the differences and starting 
number are both less than 10. See also Exercise 22.4.6.) 


5projecteucLlid. org/download/pdf_1/euclid. facm/1229442627 
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@interact 
def _(p = prime_range(200), n=110): 
L = [p,ptn,..pt+4«n] 
for z in L: 
if is_prime(z): 
print (z) 
else: 
print (factor(z)) 
break 


Fact 22.2.4 There is such a sequence of length 10 starting at 199, with differ- 
ences of 210. 


Question 22.2.5 Can find arbitrarily long such sequences in the primes? 


Once again, the answer is yes! This is a theorem of Ben Green and Terry 
Tao, which was a significant piece of Tao’s 2006 Fields Medal (though he prob- 
ably would have won it even without this, remarkable as it may seem). How 
might one prove this? That might seem mysterious, so we give the gist of an 
approach. 

Remember how there seem to be fewer primes the further out we go, even 
in an arithmetic subsequence (e.g. prime mod 4 or mod 8)? That isn’t a 
coincidence. There is a technical way to measure this: 


lim ——=0. 

n->co 6 6”N 
This follows from Chebyshev’s estimate in Theorem 21.3.6, and is called having 
zero density. We can try estimating this for 7 with specific numbers: 


+ 7(100)/100 = 1/4 = 0.25 

+ 7(200)/200 = 0.23 

+ 7(1000)/1000 = 0.168, or under 17%. 

« (1000000) /1000000 + 0.0785, or under 8%. 


Now, if you have a collection of numbers which has positive density (i.e. 
the limit is positive, not zero), it is a theorem from 1974 (by Endre Szemerédi) 
that you can get arithmetic progressions of arbitrary length in such sets. Sadly, 
even our data suggests the primes are indeed approaching zero density. 

But Green and Tao managed to show this type of method still works for 
the primes! You can’t get arithmetic progressions in just any old set with 
zero density; but somehow, although there are not many primes, there are just 
enough for things to work. 

If you are interested in the current status of really long sequences, see 
Norman Luhn’s curation of the original primerecords.dk website®. The first 
example of length 27 was found in 2019; evaluate the following cell to see the 
whole thing. 


difference =81292139*2*3*5*7*11*13%*17%*19*23 
start=224584605939537911 
for n in [0..26]: 

print (start+n«xdifference ,is_prime(start+nxdifference) ) 


Swww. pzktupel. de/JensKruseAndersen/aprecords. php 
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224584605939537911 True 


696112717486210091 True 


There have only been three record 27-length sequences, as of this writing, 
and there are no known 28-length sequences (though they must exist, by the 
Green/Tao theorem). They must even obey the following ridiculous bound 
(published in a followup to the original paper). 


Fact 22.2.6 A sequence of length k must occur before 


How do people find such lists? For that, we need a new notation. 
Definition 22.2.7 For a prime p, we call the primorial the number 
pP#= [I @ 
a<p, q prime 


where the “p sharp” or “p hash”’ denotes p primorial. © 
Armed with primorials, one usually finds such lists by the following method. 


e First, for some fixed p, compute a large set of primes of the form a-p#+1, 
keeping track of the a values in question. 


e Next, find arithmetic progressions among the values of a from your list 
(not the values of a-p#+ 1). 


e If you find a bunch of a values in a progression of the form k+ ¢-n, then 
you’ve also found a progression of primes of the form (k-p#+1)+(¢-p#)n. 


If you want to, you can even sign up to find a length 27 sequence at the 
PrimeGrid distributed search®! 


22.3 Types of Primes 


There are many types of primes we have encountered up to this point. For 
instance: 


e Germain (Subsection 11.6.4) 
e Mersenne (Subsection 12.1.3) 
e repunit (Exercise 6.6.1) 


Notice that for many of these types, we don’t know if there are finitely many 
or not! Are there any conjectures for how often certain types of primes might 
appear? 


22.3.1 Twin primes 


Consider primes in an arithmetic progression az + b. Can one say anything 
about the constants involved in these progressions? Since b is pretty arbitrary, 
we would focus on a. Here are some natural questions along these lines. 


“Officially, this should be called an octothorp(e). 
Swww.primegrid.com/forum_thread. php?id=7022 
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Question 22.3.1 Consider the following for small values of a. 


e Find some primes that look like 22+ b for some 6 and several consecutive 
xz. How many z in a row can you do? 


¢ How about for 3x + 6b? 
e What about 4x + b? 


e Are the primes you get in these cases ever consecutive? 


Hopefully it’s pretty clear that you can’t do every possible combination of 
b and a, nor can every such progression go on indefinitely! Why? 

Thinking about this and the Sieve of Eratosthenes led the French mathe- 
matician Alphonse de Polignac to the following. 


Conjecture 22.3.2 Polignac’s Conjecture. Every even number is the 
difference between consecutive primes in infinitely many ways. 

We have no proof of this. In fact, even the most basic case of Polignac’s 
conjecture is one of the most celebrated open questions in number theory — cel- 
ebrated enough that well-known comedian Stephen Colbert interviewed Fields 
medalist Tao about it®. 


Conjecture 22.3.3 Twin prime conjecture. There are infinitely many 
consecutive odd prime numbers. 


Definition 22.3.4 Pairs of primes p and qg such that p+ 2 = q are called twin 
primes. © 

There are lots of twin primes. The following cell computes twin prime pairs, 
numbered by which twin prime pair it is. The pair 17 and 19 is the fourth pair, 
for example. 


def twin_primes_upto(n): 
v = prime_range(n+1) 
L = [] 
counter = Q 
for i in range(len(v)-1): 
if v[it1]-vliJ]==2: 
counter += 1 
L.append((vlLil,vLit+1],counter)) 
return L 


twin_primes_upto (100) 


[(3, 5, 1), 
(5, 7, 2), 
(11, 13, 3), 
(17, 19, 4), 
(29, 31, 5), 
(41, 43, 6), 
(59, 61, 7), 


(71, 73, 8)] 


We can use similar searching to try to see whether there are enough that 
there are infinitely many such pairs. Here are two sample graphics. 


www.cc.com/video-clips/6wtwlg/the-colbert-report-terence-tao 
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—twin prime 8000 + —twin prime 

—C22/log(x) —C,x/log(x) 

—C2Lio(z) —CyLia(x) 
7000 4 


6000 4 


200 


150 


5000 4 


100 4000 4 


3000 
50 2000 


1000 4 


2000 4000 6000 8000 10000 2e5 4e5 6e5 8e5 1e6 


Figure 22.3.5 Estimating number of twin primes through n = 10000 and 
n = 1000000 


You can see in the preceding graphic that it’s certainly possible to approxi- 
mate the twin prime counting function in a similar way to how we approximated 
the prime counting function 7. There is a mysterious constant C2 I’ve used; it 
will be explained below. 


22.3.2 Heuristics for twin primes 


To explain how to get to twin primes, there is a nice little rule of thumb; see 
e.g. [E.4.5] for what follows. Even though we definitely do not have a proof, 
we can still give you a good idea of how these ideas come about. 

First, one might want to estimate how many primes there are up to a certain 
point to start. The problem is we should use a different idea than just looking 
at tables! What can we say that is a little smarter? 


e About half the numbers less than n are not divisible by 2. 
e About 2/3 the numbers less than n are not divisible by 3. 
e About 4/5 the numbers less than n are not divisible by 5. 
e Etc. for each prime less than /n... 


If we take this thinking to its logical extreme, you might even expect that 


1 (-) 


p<Ja 


is a good approximation of the probability that a given number z is prime. 
Unfortunately, it isn’t. In fact, this product turns out to be asymptotic to 
2e~7/log(x) (recall that + from Definition 20.3.10). 

Still, this kind of thinking is still helpful, and might help us make ideas 
for how many twin primes there are — especially if we keep in mind this isn’t 
really a probability. After all, if p > 2 is prime, then with one hundred percent 
probability the next number is not prime! And for p and p+ 2 to be both 
prime, they must also both be odd; so if p is odd, then p+ 2 is much more 
likely than a random number to be prime. 

So we do the following analysis instead. (See Exercises 22.4.11 and 22.4.12.) 
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e Although one would expect for 1/4 of all pairs separated by two to both 
be odd, n+ 2 has the same parity as n so we should expect 1/2 the pairs 
to both be odd. 


e The chances that n and n+ 2 are both not divisible by three is 1/3. 


e The chances that n and n + 2 are both not divisible by five is 3/5. 
e And so forth. 
So, having gotten a little more sophisticated, we might expect that 
1 2 
ial i 
2 ig ( ;) 
p<Va,p>2 


is a decent approximation of the probability that a given pair of consecutive 
odd numbers are both prime. 

This doesn’t look so recognizable yet, but we can do some algebra to turn 
this into something that looks better and has logarithms, just like in the prime 
number theorem. If we substitute 


0-3)=(0- ga) 6 3) 


then the approximation of the number of twin primes less than x looks more 


like this: ; 
2, IC - am) DC-3) 


p<V/x,p>2 p prime 


Finally, if we now use the earlier suggestion about the right-hand side being 
more or less the square of the number of primes, we come up with a reasonable 
suggestion that looks more familiar. 


ml (: eo 


p<VJ@,p>2 


Remark 22.3.6 The constant part of this formula is finite, and known as the 
twin prime constant: 


1 2 
Co =2 1— —— |}. 
mae ( p-l 
p>2 

The graphs in Subsection 22.3.1 use this constant (which is built-in in Sage) as 
well as a logarithmic integral version of the preceding analysis. 

There is some inconsistency in the literature about whether the 2 in front 
of the formula for C is part of the twin prime constant or not. 


This also leads to a conjecture of Hardy and Littlewood. 
Conjecture 22.3.7 The number of ways to write an even number 2k as a sum 
2 
of primes is also asymptotic to ape ees (1 = (az) : 


This would provide a very overwhelming proof of the following old sug- 
gestion, going back to correspondence between Euler and Prussian/Russian 
mathematician Christian Goldbach. 


Conjecture 22.3.8 Goldbach Conjecture. Any even number can be written 
in at least one way as a sum of two primes. 
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In fact, there are two such conjectures, with the other one suggesting that 
any positive integer may be written as a sum of three primes. There is a 
proof claimed!® for the latter ‘weak’ conjecture!!, but it has not appeared in 
a peer-reviewed journal yet. 


Historical remark 22.3.9 The Pentium bug. Returning to the twin prime 
constant, computing it (as in the Sage cell below) led to a very interesting real- 
life application. 


2*twinprime.n() 


1.32032363169374 


Computing this constant to arbitrary precision led to the discovery of the 
infamous Pentium chip bug!?, where some floating-point calculations would be 
incorrect in high decimal places. This is a quite surprising ‘application’ of num- 
ber theory! (It turns out manufacturers do use number-theoretic computations 
to stress-test their products. See also Historical remark 12.1.8.) 


Historical remark 22.3.10 Twin prime status. It is still unknown 
whether there are infinitely many twin prime pairs. In a 20138 result that 
shocked the mathematics world, (then) unknown mathematician Yitang Zhang 
proved! that there exists some N less than seventy million such that there are 
infinitely many pairs of primes separated by exactly N. This was a huge im- 
provement over previous results, and further work of an unusually collaborative 
nature! have now reduced this bound to N < 246, but the effort has not con- 
tinued progress. A related result about polynomials!® was proved in 2019, but 
this doesn’t seem to have led closer to a final resolution, either. 

As we finish this subsection, we must mention another constant affiliated 
with twin primes. Although there may really be infinitely many pairs, the sum 


of their reciprocals 
3 Pes 
p pt2 


p,p+2 both prime 


is still a finite constant. At the very least means twin primes must be pretty 
rare. This (possibly infinite) sum is called Brun’s constant. 


Sage note 22.3.11 Sage can change. Originally, this constant was included 
in Sage. However, as nearly every digit of the constant is conjectural, it was 
removed as a built-in. 


brun.n(digits=5) 


Traceback (most recent call last): 


NameError: name 'brun' is not defined 


Because Sage is open source, you can follow discussions about decisions and 
additions to Sage functionality on the Sage developer Trac!® or sometimes on 
the Github organization?”. 


Warxiv.org/abs/1312.7748 

Nen.wikipedia.org/wiki/Goldbach' s_weak_conjecture 

12www. trnicely.net/pentbug/pentbug. html 

13 www. quantamagazine. org/20130519-unheralded-mathematician-bridges-the-prime-gap/ 
14michaeLnielsen. org/polymath1/index.php?title=Bounded_gaps_between_primes 

15www. quantamagazine. org/big-question-about-primes-proved-in-small-number-systems-20190926/ 
16trac.sagemath.org/ticket/18255 

17github. com/sagemath/sage/commit/330a9a88alc202d12f30111aclcb49ff8ff43846 
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22.3.3 Other types of primes 


In the quest toward Polignac’s Conjecture, researchers have dubbed primes 
(not necessarily consecutive) with spacing N = 4 cousin primes and those 
N = 6 apart sexy primes. In another result of similar vintage to Zhang’s (and 
also collaborative like its refinement), we know (conditional upon the so-called 
“generalized Elliott-Halberstam conjecture!®”, which is closely related to our 
investigations in Subsection 22.2.2) that at least one of the classes of twin, 
cousin, or sexy primes is infinite!9?°. This is a very special case of exploring 
something called prime constellations; see Exercise 22.4.13. 

In addition, there are many other heuristics like the ones above. Here is a 
sampling of those we don’t have space or expertise in this text to dig further 
into. 


e As one example, consider the chance that n and 2n + 1 are both not 
divisible by a given prime p. Probabilistically, this is basically the same 
chance as that n and n+ 2 are both not divisible by p, so it turns out 
that Germain primes might also be distributed in the same fashion as 
twin primes. 


e Using similar ideas, one can get a heuristic that Mersenne primes are 
distributed as 
e7 log(log(x))/log(2). 


This is known as Wagstaff’s conjecture. 
e Bizarrely, one can use the same idea to get a heuristic for factorial 


primes. These are primes of the form n! + 1, like 5, 7, 23, and 719. It’s 
conjectured that there are e? log(n) such primes less than n. 


e These rules of thumb even seem to apply to the so-called primorial 
primes — primes of the form p# +1, like 3, 5, 7, 29, 31, 211, etc. It’s 
truly weird, yet also cool. 


There is so much to explore! There is never a lack of questions for mathe- 
maticians to explore when it comes to prime numbers. 


22.4 Exercises 


1. Explain why, to show that any number can be written as a sum of three 
primes, it suffices to prove Conjecture 22.3.8. 


2. In Subsection 22.1.3 a statement is made about residue classes [a] such 

that nk + a can be a perfect square. What is another name for such a? 

Also, the claim is made that, “In the two examples we showed graph- 
ically, only 4k + 1 and 8k + 1, respectively, are possible perfect (odd) 
squares.” Either prove this claim or find the reference for when that is 
proved in the book. 

3. What ‘teams’ would you expect to be in the lead long-term for a modulo 
ten prime race? Why? Compute a value where the ‘wrong’ team is in the 
lead, if you can! 

4. Prove Dirichlet’s Theorem on Primes in an Arithmetic Progression for the 
case a = 2. 


18en.wikipedia. org/wiki/ELLiottOHalberstam_conjecture 

19resmathsci.springeropen.com/articles/10.1186/s40687-014-0012-7 

20Go to the video of Tao’s interview with Colbert, linked just before Conjecture 22.3.3, 
again to see Colbert’s quite amusing reaction to this fact. 
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5. 


13. 


14. 


15. 


Find an arithmetic progression of primes of length five with less than ten 
between primes. 

Find an arithmetic progression of primes of length six or seven, starting 
at a number less than ten. 

Prove that there can be only one set of “triple primes” — that is, three 
consecutive odd primes. 


Find the value of 23#. 


Compute some twin primes greater than one thousand. 


Show that (1 Be 2) = (1 cy) (1 ay 

What form must n have for n and n+ 2 to both not be divisible by three? 
Which residues modulo five must n avoid for n and n + 2 to both not be 
divisible by five? 

Search a few resources to learn about “prime constellations” and write a 


report. The Prime Pages?! or Tomas Oliveira e Silva??’s very nice graphs 
of “admissible” constellations are a good place to start. 


Find a definition for palindromic primes?’ (base 10, say) and report on 
the current known status. Are there infinitely many, or a way to generate 
them programmatically? 


Search in a good book (see the general E.2 or specialized E.4 references) 
or the internet for an amazing fact about primes. Describe it in a way 
your classmates (or peers, if you’re not in a course) will understand. 


Summary: More on Prime Numbers 


What else can we say about prime numbers? This chapter collates some of 


the most interesting questions. 


1. In Section 22.1 we see some exciting action in asking who wins various 


prime races! 


. The next section states and gives examples of many facts about primes 


in sequence, including Dirichlet’s Theorem on Primes in an Arithmetic 
Progression and the Green-Tao theorem on sequences in primes. 


. How many Types of Primes do you know? One of the most intriguing 


questions is why so many of the questions in this section are completely 
unanswered. 


The Exercises gives practice in searching for interesting patterns in the primes. 


21¢5k.org/glossary/page.php?sort=PrimeConstellation 
22 sweet.ua.pt/tos/apc. html 
23 oeis.org/AQ02385 
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Chapter 23 


New Functions from Old 


We are heading toward the end of the text. There are even more interesting 
functions out there; just as important, there are more interesting ways to start 
connecting these functions to calculus. 

As a prelude, let us introduce an interesting function. Letting p be running 
just over primes, we let 


ee) 


and then expand the expression as a sum of unit fractions. As an example, 


D(3) = (1 — 1/2)(1 — 1/3) = (5-53-33): 
Before starting this chapter, try expanding D (as above, without adding 
the fractions) for bigger and bigger values of N. What patterns do you find? 
e What denominators show up? 
e Which ones don’t? 
e For the ones that do, what are the values of the numerator? 


e Can you predict the value of the numerator for some types of denomina- 
tors? (E.g., primes, perfect squares, prime powers, etc.) 


The function unveiled by this is quite important in expanding our roster of 
arithmetic functions and unlocking their secrets, as well as in connecting to 
calculus. 


23.1 The Moebius Function 


23.1.1 Mo6bius mu 


Let’s define the function which gives the numerator associated with denomina- 
tor n in the products above. 


403 
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Definition 23.1.1 Moebius mu. Let N = 2-3-5---q be the product of 
the first few primes, up to g. Then we define j(d) as follows: 


I (-3) Eee 


p|N 


The product is over prime factors of N but the sum is over all factors of N. 

% 

It is not at all obvious that yz will have the same value regardless of N, and 
much of the rest of this section will confirm this. 


Historical remark 23.1.2 August Mobius. Yes, this is the same August 
Moebius (or Mébius) as the Moebius strip; however, it was not he, but Johann 
Listing! who first discovered that object. On the other hand, his work with 
this function and the Mobius Inversion Formula has stood the test of time. A 
student of Gauss, Mobius’ positions were mostly directorships of major obser- 
vatories and professor of astronomy. See [E.7.26] for some historical details of 
the function, including Euler’s discovery of the same general idea via infinite 
products. 


Example 23.1.3 Using the example in the chapter introduction, 


Ds) = (1-1/2) 1/8) = (7-5-5 +5) 


implies that (2) = —1 = p(3) while (6) = 1 = p(1). 

There is no product of (1 — 1/p) that will yield a four in the denominator, 
since (1—1/2) only occurs once in such a product. So (4) = 0, as the example 
above already implies. 


23.1.2 A formula 


Before describing this function further, let’s think more about the product 
Ilp<w (1 — 1) 


e First, as the comment at the end of the last subsection points out, it seems 
to create denominators with each prime factor to just the first power. We 
couldn’t get a square or cube of any given p in the denominator. 


e Similarly, the numerators really can only be products of 1 and —1. For 
a moment, think about why there are no other numerators available. 


e Finally, the number of prime factors in the denominator should be the 
same as the number of times —1 is part of the product in the numerator. 


This essentially proves the following proposition. 
Proposition 23.1.4 [fn = p{'p5?---p)* then a nice formula for p(n) ts 


0 if anye; >1 
p(n) = ‘ 
(—1) otherwise 


Proof. See above. a 


Imathshistory.st-andrews.ac.uk/Biographies/Listing/ 
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23.1.3 Another definition 


The y function is so important that we will want several more approaches as 
well. It is a mark of an important concept that there are ways to define it from 
many directions. 

One important way that yu is often defined is via a recurrence relation. That 
is, one defines 


(1) =1, and S/u(d) = 0. 
d\n 


Now, we haven’t proved this identity yet, and probably the reader hasn’t even 
noticed it. But if we can prove the identity works for yu, then since (1) = 1 is 
true, this would give an alternate definition. 


Proposition 23.1.5 Recursive definition of uw. We can define ws by setting 


p(1) =1 and 

SS" u(d) = 0. 

d|n 
Proof. Let’s rewrite the sum }7q),, 4(d) = 0 by trying to omit the p(d) that 
equal zero. If we do this, the sum reduces to the long, but correct, 


S- u(d) = S- he number of primes dividing d 


d|n all divisors d with just one or zero 
of each prime factor p; of n 


Now let’s set up a little notation. First, let’s borrow from Definition 23.3.3 
the notation w(d) for the number of distinct prime divisors of a divisor d of n. 
Next, for convenience we will write k = w(n) for the number of (again, distinct) 
prime divisors of n itself. 

Then the crazy sum 74), 4(d) becomes easier to write: 


S- (1), 


all divisors d with just one or zero 
of each prime factor p; of n 


If at this point you are asking yourself why I bothered introducing k, you may 
want to think about that briefly while reading the next formula: 


S- (= ss S- ies ce"), 


all divisors d with just one or zero d that work 
of each prime factor p; of n 


Note that (k — w(d)) +w(d) =k. 

Let’s step back for a rationale for all this manipulation. Consider each of the 
divisors d with no square factors (the ones in question that we are indexing 
by). Each of these have w(d) of the prime factors of n; that means that they 
do not have the other k — w(d) possible prime factors available to us from n. 
So in the expression (1)*—“( (—1)“ we are, in some sense, picking a subset 
(of size w(d)) of the primes dividing n and multiplying by 1 for each of those; 
likewise we multiply by —1 for each possible prime not picked. 

This is a combinatorial point of view, which means we can count all this pick- 
ing another way. Instead, consider just picking a subset of {1,2,...,k} and 
assigning +1 respectively; that would be the same thing. However, we can 
reinterpret that as picking a particular term in the full expansion of the kth 
power of the binomial 1 + (—1): 


(1+ (-1))* = (14 (-1))(1 4+ (-1))--- (14 (—1)) (k times, for 2,3,...,p%). 
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Summing all the possible terms must be the same as calculating this power, so 
an easy computation finishes the proof: 


So (1° = (14 (-1))* = 0. 
d that work 
a 


Sage note 23.1.6 Check your work again. Remember, we can always 
check calculations like this with our computational assistant. 


moebius (30) + moebius(15) + moebius(10) + moebius(6) + 
moebius(5) + moebius(3) + moebius(2) + moebius (1) 


Fact 23.1.7 The function yu is multiplicative. 

Proof. We will postpone a formal proof of this to a much bigger theorem, from 

which this result (Corollary 23.4.15) will fall “for free”. a 
Let’s check it: 


print (gcd(111,41)) 
print (moebius (111) xmoebius (41) ==moebius (41*111)) 


True 


23.2 Inverting Functions 


The main point of the Moebius function is the following famous theorem. 


Theorem 23.2.1 Mobius Inversion Formula. [f f(n) = )0 4, 9(d), then 


a(n) = > nld)f (4). 
d|n 


Proof. The proof is delayed to Subsection 23.2.2. a 

We can interpret this result briefly as follows. Suppose you sum an arith- 
metic function over the set of the (positive) divisors of n to create a new 
function of n. Then summing that function over divisors, along with ju, gives 
you back the original function. 

The reason we care about this is that we are able to use the ys function to 
get new, useful, arithmetic functions via this theorem. In particular, we can 
“invert” all of our usual arithmetic functions, and this will lead to some very 
powerful applications. 


Example 23.2.2 If we apply this theorem to 


T(n) = S- l= S—u(n) 
dln 


dln 


(recall Definition 19.2.9) then it implies 
n 
S~ u(d)r (=) =4 
d|n 


This is worth checking by hand or with Sage. Somehow, mysteriously, the 
number of divisors weighted by the yw function nearly balances out. 
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23.2.1 Some useful notation 


In order to better understand what this theorem is saying, let’s introduce some 
notation. 


Definition 23.2.3 Dirichlet product. Let f and g be arithmetic functions. 
Then we define the new function fxg, the Dirichlet product, via the formula 


(Fx 9)(n) = > Fg(©) = 2 Fg (4). 
d|n 


de=n 


% 


Example 23.2.4 For example, if we recall u(n) = 1 and N(n) = n from 
Definition 19.2.9?, then 


(d*u)(n =Liwa ju (5) = 30 o@) =n=NQn). 
dln 


We saw this originally in Fact 9.5.4, but now we can write it concisely as 
oxu=N and see it is part of a bigger context. (See also Fact 23.3.2.) 


This notation, like all the best notation, practically demands that we restate 
the inversion theorem in a very insightful way: 


If f=gxu, then g=fx«uyp. 


23.2.2 Proof of Moebius inversion 


Now we are ready to prove the Mobius Inversion Formula, following the stan- 
dard proof, as for example in [E.2.1]. 
Let’s expand the formula for g(n) the theorem would give, in terms of g 


itself. 
uae (F) =a |S ale) 
dln dln 


el 


Each time g(e) appears in this sum, it has a coefficient of u(d). How often 
does this happen, and what is d anyway? 

If e | 4, then e | n, which means ® is an integer. However, this integer 
must have at least a factor of d “left” in it (after division by e). Why? Since 
e divides %, we have ed | n, in which cane certainly d| ®. 


So g(e) shows up once for each d| 2, with coefficient we: Thus, 


n 
ugs (2) = [Lu | glo. 
d|n eln \dl2 
Here comes the final step. Unless ® = 1, we have }7 y(d) = 0. So the only 
subsum in this double sum that sticks around is the term for 4 = 1, or when 
e=n. 
Thus the whole sum collapses to g(n), as desired! 


2See also Definition 23.3.1. 


CHAPTER 23. NEW FUNCTIONS FROM OLD 408 


23.3 Making New Functions 


23.3.1 First new functions 


In order to see what good this does, let’s see what happens when we mess 
around and make Dirichlet products with functions we know. We already 
know two of these functions, and I give you a third. 


Definition 23.3.1 We define a new simple arithmetic function to go along 
with those from Definition 19.2.9. 


e u(n) =1 for all n 
« N(n) =n for all n 


. Hin) {" n=1 


0 n>1 


% 
In the next computational cell, we define these using Sage (recall Sage 
note 11.1.1), as well as a Dirichlet product function. 


def u(n): return 1 

def N(n): return n 

def I(n): return floor(1/n) 

def DirichletProduct(f,g,n): return sum(f(d)*g(n/d) for d in 
divisors(n)) 


Now let’s see what we get! For instance, what happens if we look for the 
inverse of N? (You can try it by hand too, of course.) 


@interact 
def _(n=10): 
H = [EL'$i$',r'$C(N\star \mu) (i) $'J] 
T = [(i, DirichletProduct(N,moebius,i)) for i in [1..n]] 
pretty_print(html(table(H+T, header_row=True, frame=True 
»)) 


Maybe this is a surprise! But this makes sense, if you remember Exam- 
ple 23.2.4 just previously about N = ¢xu. Let’s confirm that fact numerically 
as well. 


@interact 
def _(n=10): 
He EE gis. s CNphiNsitare UDG) Sold 
T = [(Ci, DirichletProduct(u,euler_phi,i)) for i in [1..n]] 
pretty_print(html(table(H+T, header_row=True, frame=True 
Dp») 


We summarize these explanations as follows. 


Fact 23.3.2 We may identify the following Dirichlet products as known func- 
tions. 


e oxu=N 
e Nxp=o 


CHAPTER 23. NEW FUNCTIONS FROM OLD 409 


Both parts of Fact 23.3.2 can be connected to work from much earlier. The 
first part is another proof of Fact 9.5.4, while the second part gives an alternate 
proof for our formula for ¢ from Exercise 9.6.11: 


o(n) = (N*n)(n) = ON (dn (4) = 
dln 


dw (F)aed=ay AP =e (1-5). 


Pp 
p\n 


The middle step follows if we let e = n/d, since that sum will also go through 
all divisors of n. The last step follows from our initial definition of ys in Defin- 
ition 23.1.1. 


23.3.2 More new functions 


Next, please try computing the Moebius inversions of our old friends, o and 7, 
by hand for several values. (Hint: try primes and perfect powers first, as they 
don’t have many divisors!) 

You can try something out here in Sage as well. 


If you are online, in the next few cells one can try this interactively. (If you 
get an error, you'll need to evaluate the earlier cell after Definition 23.3.1.) 


@interact 
def _(n=10): 
H = [['$i$',r'$(\tau\star_\mu) (i)$'J] 
T = [(i,DirichletProduct (lambda y: 
sigma(y,®),moebius,i)) for i in [1..n]] 
pretty_print(html(table(H+T, header_row=True, frame=True 
»)) 


@interact 
def _(n=10): 
H = [['$i$',r'$(\sigma\star _\mu) (i)$'J] 
T = [(i, DirichletProduct(sigma,moebius,i)) for i in 


il. studi] 
pretty_print(html(table(H+T, header_row=True, frame=True 
»)) 


There is a load of fun to be had here. We could try to see what ux py is, or 
uxu. Could there be a formula for |ju|, or could we calculate |p| * u? 


@interact 


def _(n=10): 
H = [EL'$i$',r'$C\mu\star_\mu) (i)$']] 
T = [(Ci, DirichletProduct (moebius ,moebius,i)) for i in 
Eco tl] 
pretty_print(html(table(H+T, header_row=True, frame=True 
»)) 


@interact 
def _(n=10): 
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H [DE '$i$',r'$Cu\star_u)(i)$'J] 

T [(i,DirichletProduct(u,u,i)) for i in [1..n]] 

pretty_print(html(table(H+T, header_row=True, frame=True 
»)) 


It turns out you can define all kinds of other functions. We already saw the 
first of these informally in our discussion of the Moebius function in Proposi- 
tion 23.1.5. 


Definition 23.3.3 If 
k 
n= [[* 
i=1 


then we can give the name w(n) = k to the number of unique prime divisors 
of an integer. (This is sometimes called v(n) in the literature.) % 


Definition 23.3.4 If n = Tl p;', we summarize the parity of the total 
powers of primes dividing a number as 


d(n) = (—1) reat ren, 


This is called Liouville’s function. ©) 
In both cases, you might want to try a few values to see what these functions 
look like. See Exercise 23.5.1, or pursue these ideas: 


e What is the value for primes? 
e What is the « product of this with something — say, u? 


Finally, we provide some Sage cells to try things out; the first one defines 
our functions, and the interact lets you explore. Then again, you should try 
them not just with Sage, but also by hand; this is part of the allure of number 
theory. The sky’s the limit. Enjoy! 


def u(n): return 1 

def N(n): return n 

def I(n): return floor(1/n) 

def omega(n): return len(n.prime_divisors()) 

def lLiouville(n): return (-1)*sum([z[1] for z in n.factor()]) 

def DirichletProduct(f,g,n): return sum(f(d)*g(Integer (n/d)) 
for d in divisors(n)) 


@interact 
def _(n=10,f=[Lliouville,u,N,moebius, omega ,I], 
g=[LLiouville,u,N,moebius, omega,I]): 
H = [[L'$i$',r'$(%s\stari%s) (1) $'%(f,¢g)]] 
T = [(i, DirichletProduct(f,g,i)) for i in [1..n]] 
pretty_print(html(table(H+T, header_row=True, frame=True 
2) 


23.4 Generalizing Moebius 


There is a more serious side to the panoply of new functions, though. This 
is our key to arithmetic functions. We will now turn to algebra again, with a 
goal of generalizing the Moebius result. 
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23.4.1 The monoid of arithmetic functions 


Definition 23.4.1 A commutative monoid is a set with multiplication (an 
operation) that has an identity, is associative and commutative. © 

You can think of a commutative monoid as an Abelian group without re- 
quiring inverses. (That means it’s not necessarily a group, though it could be; 
see Definition 8.3.3.) 


Theorem 23.4.2 Let A be the set of all arithmetic functions. Then x turns 
the set A into a commutative monoid. 

Proof. The function I(n), which is equal to zero except when n = 1, plays the 
role of identity. Then one would need to prove the following three statements. 


© fxg=gxf 
(fxg)xh=fx(gxh) 
e fxl=f=Ix«f 


We include one of the proofs. The others are similar — see Exercise 23.5.2. 
Note that for the second one, one can use the fact that dc = n,ab = d implies 
abc = n. 

Proof of commutativity: 


(fe g)(n Pa 9 (5) = ¥ s@s(0) 


de=n 


= ¥ 9044 = oF (FE) = Gx Nm) 


de=n e|n 


a 
Can you think of other commutative monoids? What sets have an operation 
and an identity, but no inverse? 


23.4.2 Bringing in group structure 


Let’s get deeper in the algebraic structure behind the «x operation. Remember, 
f xg is defined by 
(fxg)(n)= So fd 


de=n 


This structure is so neat is because it actually allows us to generalize the 
idea behind the Moebius function! 


Theorem 23.4.3 If f is an arithmetic function and f(1) 40, then f has an 


inverse in the set A under the operation x. We call this inverse f—'. It is given 
by the following recursive definition: 


ae = sip n=1 

Lam fF (F) = Lacon SAFC) =0 n> 1 

Proof. As in all the best theorems, there is really nothing to prove. The 
definitions for n > 1 are equivalent ways of representing the same thing. We 
can always get the next value of f~1(n) by knowledge of f~1(d) for d|n, and 
that is enough for an induction proof, since we do have a formula given for 
f-1(1). (See Exercise 23.5.9) a 


Corollary 23.4.4 This can be immediately used to show that the Moebius 
function pp is w= u-* (and hence u = p~'). 


CHAPTER 23. NEW FUNCTIONS FROM OLD 412 


Corollary 23.4.5 Since w(1) = 0, the function w has no inverse. 

This is a good time to try to figure out what the inverse of N or ¢ is with 
paper and pencil. See Exercises Exercise 23.5.4 and Exercise 23.5.5. 

In general, we can also say that 


fxf=l=fo'xf 

There is another, more theoretical, implication too, hearkening back to Sec- 
tion 8.3. 

Corollary 23.4.6 The subset of A which consists of all arithmetic functions 
with f(1) £0 is actually a group. 

Remark 23.4.7 Much of this chapter is done in slightly variant ways in intro- 
ductory books, at a similar level. For a higher-level but useful and readable 
account of the ring theory of arithmetic functions (including valuations and 
derivations), see [E.2.8, Chapters 3 and 4]. For good exercises see [E.4.6, Chap- 


ter 2] or [E.2.9, Chapter 2]; for instance, the latter asks for identifying the 
idempotents of A. 


23.4.3 More dividends from structure 


This new way of looking at things yields an immediate slew of information 
about arithmetic functions. The following results will yield dividends about 
number theory and analysis/calculus (no, we haven’t forgotten that!) in the 
next chapter on Infinite Sums and Products. 


Fact 23.4.8 The Moebius inversion formula that if f=gqxu theng=fxp 
can be proved concisely by 


g=gxl=greuxpa=fru 
(We need no parentheses, since x is associative). 


Fact 23.4.9 Conversely, if g = f x p, then 


f=fxl=fxuxu=gxeu 
so the inversion formula is true in both directions. 


Proposition 23.4.10 If g and h are multiplicative, then f = g xh is also 

multiplicative. 

Proof. See Exercise 23.5.8. a 
The next result has a long proof, but most of it is following the definitions 

and keeping careful track of indices. See [E.2.1, Exercise 8.20] or [E.2.13, Chap- 

ter 5.3] for similar approaches. 


Proposition 23.4.11 If f is multiplicative and f(1) 4 0, then f~+ is also 
multiplicative. 
Proof. This basically can be done by induction, but each step is somewhat 
involved so we will break this into several lemmata. Throughout, recall that 
the inverse is defined by : 
-1 
ff") FD 


and, for n > 1, the condition 
DF @s (F) = ¥ Pr OsFO =0. 
d\n de=n 


First, in Lemma 23.4.12 we will show that f~'(1) behaves well. 
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Then, assuming as an inductive hypothesis that f—! is multiplicative for inputs 
less than mn, with gcd(m,n) = 1, we will show in Lemma 23.4.13 that 


fm)=- SO fl @f OFOF@ 
(ac) (bd)=(m)(n) 


ab<mn, alm, b|n 


Finally, in Lemma 23.4.14 we will show how to rewrite this as 


fo'(mn) = foi (m)f-*(n) 
which finishes the induction argument. a 


Lemma 23.4.12 We know that both f~'(1) = FD and FA) =1 = 7-1). 


Proof. Left to the reader in Exercise 23.5.10; use everything you know about 
f. a 


Lemma 23.4.13 Assume as above that f—! is multiplicative for inputs less 
than mn, with gcd(m,n) = 1. Then 


f-*(mn) = — oD fU@FrOFOF(A). 
(ac)(bd)=(m)(n) 


ab<mn, alm, bln 
Proof. Assume that m,n > 1 and coprime. By the definition of inverse, we 
have 


0=(f-'« f)(mn) = SS (fo '(a) f(y) | + fo (mn) f (1). 


xLr<mmn, cy=mMmn 


By assumption, every function in this expression (both f and f~+) is multi- 
plicative on the values in question, with the possible exception of f~!(mn). 
We can use this effectively because each summand is for a divisor x | mn, 
which we can write as ry = mn. Since m and n are coprime, both x and y are 
themselves products of coprime divisors dividing m and n respectively. 

So let « = ab and y = cd, where a,c | m and b,d|n. Then, as everything is 
multiplicative, f-'(x) f(y) = f- (a) f-4(0) flo) f(A). 

Since by the previous lemma f(1) = 1, we can subtract the summation from 
both sides of the equation whose left-hand side is zero at the beginning of this 
lemma’s proof, yielding 


fl(m)=- So ft @FftOFOLM.- 
(ac) (bd)=(m)(n) 


ab<mn, alm, b|n 


Lemma 23.4.14 Under the same hypotheses as before, f~'(mn) = 
f-*(m)f-*(n). 

Proof. We now write all this in terms of things we already can evaluate. 

If the sum in question were summed over every ab < mn instead of ab < mn, 
it would easily simplify as a product: 


~~ FPOFOHFOF@ = SOF MAOFO DFT OL 
scr iia ac=m bd=n 
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The sum in Lemma 23.4.13 only lacks the term with a = m,b =n, in fact. So 


fF (@) FO OFFA) = 
(ac) (ba)=(m)(n) 


ab<mn, alm, b|n 


bs Fa) fo) S7 FT OFA] — FMF) F DFM) 


ac=™m bd=n 


Now we can plug this back into the previous characterization of f~'(mn): 


f-t(mn) = — b> FTC @PO > FTOF@ -— fm) fF) FOF) 


ac=mM bd=n 


Since m,n > 1, the individual sums may be rewritten as 


(f-* * f)(m) = I(m) =0 = I(n) = (f-* * f)(n) 


That means we achieve the desired result 


foh(mn) = fom) F(a) FFA) = £m) fo" (n) 


Finally, we get the following promised corollary from the beginning of the 
chapter, Fact 23.1.7. 
Corollary 23.4.15 The function ys is multiplicative. 
Proof. This follows since u is multiplicative (trivially) and wp = u7t. | 


23.5 Exercises 


1. Factoring by hand, compute the first 24 values of X and w (recall Defini- 
tion 23.3.4 and Definition 23.3.3). 

2. Finish the proof that the set of arithmetic functions is a commutative 
monoid in Theorem 23.4.2. 

3. Show that if f = g*u (equivalently, if g = fx), then f and g are either 
both multiplicative or both not. Strategy hint: Use Proposition 23.4.11. 

4. Do enough calculations without using electronic devices to discover a for- 
mula (in terms of functions we already know) for the inverse of N. 

5. Do enough calculations without using electronic devices to discover a for- 
mula (in terms of functions we already know) for the inverse of ¢. 

6. Show that the inverse of A(n) from Definition 23.3.4 is a variant of another 
of our new functions. 

7. Can you identify w* yu as anything familiar? (Recall Definition 23.3.3.) If 
yes, then try to prove it; if not, explain why you think it is new to us. 

8. Prove Proposition 23.4.10 that using the Dirichlet product on two multi- 
plicative functions stays multiplicative. 

9. Complete all details of the proof of Theorem 23.4.3 defining inverses under 
the « product. 

10. Prove Lemma 23.4.12. 

11. Recall from Exercise 18.3.5 the function D(n) = (—1)"~!. Compute p * 
D, and prove a formula if you can. (For further exercises, see Exercise 
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Group 24.7.9-10.) 

12. Come up with another good exercise for this chapter and have a friend 
try it! 

Summary: New Functions from Old 


In this chapter, we see a lot more arithmetic functions, and how to tackle 
them systematically. 


1. Which definition of the function do you prefer — Definition 23.1.1, 
Proposition 23.1.4, or the Recursive definition of 14? 


2. The next section puts the Mébius function into context, including the 
definition of the Dirichlet product. 


3. Finally, we prove quite general results about combining arithmetic func- 
tions, including Proposition 23.4.10. 


The Exercises are particularly interesting because you have the chance to see 
that many combinations of functions give ones you already know. 
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Chapter 24 


Infinite Sums and Products 


We are almost at the very frontiers of serious number theory research now. In 
order to start to understand this, we will need to introduce two final concepts: 


e Euler Products 
e Dirichlet Series 


These concepts both deeply involve infinitely applied operations, and are what 
this chapter is about. If you wish, think of this chapter as the ‘infinite’ version 
of the previous chapter on new functions. 


24.1 Products and Sums 


In order to motivate bringing infinite processes to this part of number theory, 
let’s step back a bit. Many functions we have already seen may be thought of 
in two ways — either as a product or as a sum. 


24.1.1 Products 


Let p | nm as an indexing tool denote the set of primes which divide n = 
I, prime P° (recall Example 6.3.4). Then we have the following product rep- 
resentation of two familiar arithmetic functions. (Recall Theorem 19.2.5 and 
Fact 18.1.1.) 


on) =T] (“—*) ST pag eetyh 


p-1l 
pin pin 
1 
o(n) = n|I (1 _ *) 
pin - 


Both of these functions therefore may be thought of as (finite) products. 
As a related example, we explicitly wrote out the product for the abundancy 
index in Section 19.3. 


a(n) Ul (2) - = (1/p*) 
no sie gl leery 


p|n 


417 
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Alternately, to avoid fractions: 


o(n)  [Ipn (1 +p tp? +---+p*) A ae - 
= =|[(@t+pt+p?+---+ 
n ence p\n ( : : 


Note that %) = lee (1 — 2). 


24.1.2 Products that are sums 


On the other hand, these products over primes are also sums over divisors; this 
is true either by definition or by theorem, depending on how you look at it. 
It’s clear with o that this is the case, since we defined (in Definition 19.1.1) 


a(n) = S- d 
d\n 
We can even cleverly add up the divisors in the opposite order to get the slightly 
more felicitous l 
n 
onan th. 
d d 
d\n din 
This led us directly to writing a(n) a Sali + in Fact 19.4.9; now we can also 


d 
ee d 
write it as oan ula) 


With ¢ we have something to prove to make this connection, but not much. 
In Fact 23.3.2 we saw that 6xu= N=> @=Nxu. Equivalently, we have 
Mobius-inverted Fact 9.5.4 to obtain, from 5> dir ¢(d) = n, the formula 


Y au (5) = o(n) 
d|n 


By adding the divisors in the opposite order (alternately, by noting * is com- 
mutative) we can write 


n L(d 
6(n) = nN = > n(d) (4) =n AO. 
d\n d|n 
which allows us to also write the fraction as 


60) _ 5 ald), 
n a. 


Now, in some sense we already knew all this. Great, some arithmetic func- 
tions can be represented either as a sum over divisors or as a product over 
primes, depending on what you need from them. So what? 

The genius of Euler was to directly connect these ideas. 


Fact 24.1.1 We can equate sums over divisors and products over primes to 


obtain special formulas. Given n = II, prime Ps We have 


oe W035) 


pin 


a(n) 1 1 1 1 
eee Pe ae ee) ay 
Oe ad 


pin d|n 
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Well, this was almost the genius; his real genius was to take these ideas to 
the limit! 

One can’t really take these expressions to infinity as they stand — one would 
get massive divergence. So what can we do? To analyze this, we will define new, 
related functions which preserve the summation, but allow for convergence. 


24.2 The Riemann Zeta Function 


24.2.1 A fundamental function 


The most important such infinite process is the following fundamental func- 
tion. It is one of the most studied, yet most mysterious functions in all of 
mathematics. 


Definition 24.2.1 Riemann zeta function. We define the zeta function 
(denoted ¢) as the sum of the infinite series 


as a function of s. 

For now we’ll keep the domain of ¢ to be only the s where this series 
converges. Later, in Subsection 25.3.1, we’ll see that it will be useful to think 
about what ¢ might mean for other values of s. © 


Here we plot the function for a few positive values of s. 


10 5 


Figure 24.2.2 The Riemann zeta function 


(plot(zeta, 0,4, ymin=-1, ymax=10)) 


Historical remark 24.2.3 Bernhard Riemann. Riemann, the quietly 
devout son of a Lutheran pastor, made ground-breaking contributions in nearly 
every area of mathematics. He did it in analysis (Riemann sums), in geometry 
(Riemannian metrics, later used by Einstein), in function theory (Riemann 
surfaces) — and in one paper that changed the course of number theory. He died 
quite young (around 40), but unlike some of his contemporaries had achieved 
wide recognition in his own lifetime for his advances. 
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24.2.2 Motivating the Zeta function 


The motivation for this definition comes from this function with the case s = 1. 
We begin with the second formula in Fact 24.1.1: 


i. 4 1 i 
(++ +5 pet) = 35. 


pin P d|n 


Try computing both sides of this and seeing how they come together for a few 
fairly composite n, like 12, 16, 18, 20, or 30. 


@interact 
def _(n=[30,20,18,24,12,16]): 


str = '$$'+' + _'. join([r'\frac{1}{%s}'%d for d in 
divisors(n)])+'=%s$$'%sum([1/d for d in divisors(n)]) 
str2 = '$$' + 


'' join(ir'\left¢C'+'+!'. join ([r'\frac{1}{%s*{%s}}'%(p, 
k) for k in [@..e]])+r'\right)! for (p,e) in 
factor(n)]) + '=%s$$'%prod(Lsum([p*(-k) for k in 
[@..e]]) for (p,e) in factor(n)]) 
pretty_print (html (str)) 
pretty_print(html("compare_to."+str2)) 


Notice how every integer d formable by a product of the prime powers 
dividing n shows up precisely once (as a reciprocal) in the sum. This gives us 
a way into introducing limits. 

What would happen if we introduced infinity in each term of the product, 
for instance? 


Cetra mees ee 
2° 92° 98 3° 32 ° 38 


By analogy, we should get a sum with exactly one copy of the reciprocal of 
each number divisible by only 2 and 3, e.g. 


= 


2\n or 3|n 


@interact 
def _(e=(1,[0..6]),f=(2,[0..6])): 
n = 2%ex34f 
pretty_print (html ("You_picked. 
$%S=2°{%S}3%{%s}$"%(n,e,Ff))) 


str = '$$'+'_+_'. join([r'\frac{1}{%s}'%d for d in 
divisors(n)])+'=%s$$'%sum([1/d for d in divisors(n)]) 
Str2 Ss FoR! a Fl jommcder \ bere Ces! use 


' join ([r'\frac{1}{%s*{%s}}'%(p,k) for k in 
[@..e]])+r'\right)!' for (p,e) in factor(n)]) + 
'=%s$$'%prod(Lsum([p*(-k) for k in [@..e]]) for 
(p,e) in factor(n)]) 
pretty_print (html (str)) 
pretty_print (html ("compare_to."+str2)) 


There is no reason this wouldn’t continue to work for many prime factors. 
Because every integer is uniquely represented as a product of prime powers 
(Fundamental Theorem of Arithmetic), this implies that we might multiply out 


CHAPTER 24. INFINITE SUMS AND PRODUCTS 421 


the left-hand side of an infinite product of infinite sums to get 


L. 4. 72 = ik 
Met stat at )=Le 


Since each of the multiplied terms on the left is an infinite geometric series, we 
can simplify the product slightly to write 


24.2.3 Being careful 


So much for Euler’s contribution, a very impressive one. The only problem 
with all this is that both of these things clearly diverge! 

Thus we cannot use a simple equality (=) for this discussion. Nonetheless, 
Euler’s intuition is spot on, and we will be able to fix this issue quite satisfac- 
torily. For now, we can say is that, in some sense, the harmonic series is also 
an infinite product: 


oa ae ~ II Gj 7) 7 II ( al 


To make this rigorous, we should start talking about convergence. Recall 
this informal version of the integral test for series (see for example Active 
Calculus’). 


Proposition 24.2.4 Integral test for series convergence. Assume f is a 
positive decreasing function going to zero as x + co. Then the series \\"_, f (i) 
converges if and only if the integral te f(x)dx converges. 


How does this apply to our situation? The improper integral in the case of 


¢(s) is . 
[ote 


As an example, in calculus one might have shown that }> 
evaluating [; %. 
The general integral evaluates as 


- =p oth |m 1 1 
«dz =———| = 1— lim . 
1 ls |y l-s too x81 


For s a real number, this converges precisely when s > 1 (since that keeps x 
in the denominator), which begins to inform us about ¢. 


CO. el 
n=1 n? 


converges by 


Fact 24.2.5 The infinite sum ¢(s) converges for all s > 1. 

But why is the (infinite) product equal to this infinite sum too? Is this 
product even meaningful? After all, it is not true in general that if a partial 
product equals a partial sum, then the ‘full’ sum is the ‘full’ product. 

One has to carefully set up the convergence. If we can show that the product 
converges to the sum, then both will converge. Then it will make sense to say 


that 2 
Ss) = d. a 7 II (; ==) 


n=1 p 


lactivecalcuLlus.org/single/sec-8-3-series. html 
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24.3 From Riemann to Dirichlet and Euler 


In order to see this (the convergence of the infinite product), let’s instead 
observe our other main example of a sum over divisors equalling a product 
over primes working. When we compared them for ¢ above, we got 


@interact 
def _(e=(1,[0..3]),f=(2,[0..3]),g=(0,[0..3])): 
n = 2*%ex3*fx5%¢g 
pretty_print (html ("You picked. 
$%S=2°{%S}3°{%s}5*{%sS}$"%(n,e€,f,2))) 
str = '$$'+'+!'. join(ir'\frac{%s}{%s}'%(moebius(d),d) for 
d in divisors(n)])+'=%s$$'%sum(L[moebius(d)/d for d 
in divisors(n)]) 
str2 = '$$'+''. join([r'\left(1-\frac{1}{%s}\ right) '%p 
for (p,e) in factor(n)])+'=%s$$'%prod([1-1/p for 
(p,e) in factor(n)]) 
pretty_print (html (str)) 
pretty_print(html("compare_to."+str2)) 


We could make the powers far higher, or include more primes, and it would 
still work. Going to both limits, this would lead to the series 


> p(n) 


n=1 


24.3.1 Dirichlet series 


We give such series a name. The following definition is purely formal, consid- 
ered without considering issues such as convergence. (See [E.2.8, Chapter 4.6] 
for an interesting formal viewpoint on the set of these series.) 


Definition 24.3.1 Dirichlet Series. In general, for an arithmetic function 
f(n), its Dirichlet series is 


% 


Answer the following three questions to see if you understand this definition. 
(See Exercise 24.7.1.) 


e For what arithmetic function is the Riemann zeta function the Dirichlet 
series? 


e What would the Dirichlet series of N be? 
e What about the Dirichlet series of I? 


Note that this already indicates some level of connection between arithmetic 
functions. These are connections which may not have been evident other- 
wise. 
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24.3.2 Euler products 


For our purposes, the very important thing to note about such series is that 
they often can be expanded as infinite products. 


Definition 24.3.2 Euler Products. In general, for an arithmetic function 
f(n), its Dirichlet series is said to have an Euler product if the series can be 
written as an infinite product in the following manner: 


ae fn) = II a formula involving f(p) and p*). 


n=1 p 


% 


Example 24.3.3 Euler product for Riemann zeta function. We have 
already suggested one for the zeta function: 


0-E4-M(n) 


n=1 p 


Based on the logic of this section, we have a potential new Euler product 
for the Dirichlet series of the Moebius function: 


yo A =T(1- 55) = Ta») 


S 
n=1 Pp P Pp 


At least, we can consider this wherever it makes sense. See [E.4.6, Chapter 11.5] 
or [E.2.1, Chapter 9.8] for some criteria, or simply below at Theorem 24.5.4. 

In the next section, we justify more of this discussion, and connect our 
wonderful results about Dirichlet products of finite arithmetic functions to 
deep properties of their Dirichlet series. 


24.4 Multiplication 


At the end of the previous section, you may have noticed something surprising. 
The Euler products we obtained for the Riemann ¢ function and the Dirichlet 
series of the M6bius function are multiplicative inverses of each other: 


rp (Ib-r) 


Pp 


We can check this numerically as well; in the following examples, we use s = 2. 


sum([Lmoebius(n)/n*2 for n in [1..10000]]).n() 


@.607926897331474 


1/zeta(2).n() 


@.607927101854027 
They agree up to quite a few digits when we approximate both representa- 
tions of the number, so that is a start at reasonability! 
Finally, recall from our exploration of the average value of o in Section 20.4 
that ¢(2) = se (though there we just used this as a sum, and didn’t call it 
¢(2)). Compare this computation with the ones above. 
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1/(pi*2/6) .n() 


@.607927101854027 


Remark 24.4.1 Zeta has interesting values at integers, not just for s = 2. 
Euler calculated many even values of ¢, which all look like 7?” times a rational 
number (see any description of the so-called Bernoulli numbers). However, 
it was only in 1978 that ¢(3) was shown to be irrational. It was then named 
Apéry’s constant after the man who proved this, Roger Apéry. (See [E.5.12].) 

To compare with the situation for even n, as of this writing it is still only 
known that at least one of the next four odd values (¢(5),¢(7), ¢(9), ¢(11)) is 
irrational?. See Wadim Zudilin’s website+ for many links, though this page 
hasn’t been updated for some time. 

Let’s reinterpret this connection between the Euler products of the ¢ func- 
tion and the Mobius series just a little bit. Assuming we can prove that all 
this makes sense (which we haven’t, yet), we have the following two analogous 
facts. 


Fact 24.4.2 The arithmetic functions u and « are inverses as arithmetic 
functions; that is, ux u=T. 
The Dirichlet series of these functions are also inverses, as ordinary func- 


tions: 
He5-(Ih->) 


Pp 


Alternately, S>>~_, uO) — 1/¢(s) 


ns 


This analogy is not a coincidence. 


Theorem 24.4.3 Use the following notation: 
e Take f(n) and g(n) to be two arithmetic functions. 
e Leth=f xg be their Dirichlet product. 
e Let F,G,H be the corresponding Dirichlet series (in the variable s). 


Then if the series F and G converge absolutely for any particular s, then 
HT converges and H = FG for that s as well. 
Proof. First, we need there is a key fact you may or may not have seen in 
calculus, related to absolute convergence (see for example Active Calculus®). 
Roughly speaking, when series converge absolutely, you can mess around with 
them with a lot with impunity. See, for instance, Mertens’ Theorem on conver- 
gence of Cauchy products. Interestingly, neither [E.4.6] nor [E.2.1, Theorem 
9.6] say much more about this in their presentation of this standard proof. See 
Exercise 24.7.3 if you have not encountered this! 
In any case, since F and G do converge absolutely, we can and will mess around 
a lot with the product 


f(r)a(m) 


nemés 


In particular, we can group the products by the terms (the same way 


2en.wikipedia. org/wiki/BernoulLi_number 

3 And various other similar facts, such as Ball and Rivoal’s result that infinitely many 
positive odd zeta values are irrational. 

4www.math.ru.nl/~wzudilin/zw/ 
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we did in proving things about « in Subsection 23.4.3), without loss of equality. 
We can further group by when n and m are complementary divisors of the 
same number (I suggest using specific numbers to try this out). This gives 


F(s)G(s)=S> SO Fgh) 


d=1 nm=d 


Notice that the inner sum is precisely the Dirichlet « product (except divided 
by d*). So we may rewrite this as 


The numerators are the definition of h, so this is just H(s), as desired. (In 
[E.4.6, Theorem 11.5] the additional detail that any Dirichlet series with these 
values must be the one for f x g is proved, which requires a uniqueness result 
for the series we will omit.) a 

This is a quite remarkable and deep connection between the discrete/ 
algebraic point of view and the analytic/calculus point of view. It is a shame 
that this is not exploited more in the standard calculus curriculum, though see 
[E.6.8] for a very good resource for those who wish to do so. 


24.5 More series and convergence 


24.5.1 A series for Euler phi and a general theorem 


We can now feel confident applying these amazing facts to calculate the Dirich- 
let series of ¢ in terms of the Riemann ¢ function. We’ll see a few facts along 
the way which could serve as templates for many such investigations. 


Fact 24.5.1 Call P the Dirichlet series for ¢; it converges for s > 2. 

Proof. From Fact 23.3.2, we recall that ¢xu = N. Also, we know from earlier 
in this chapter that ¢ is absolutely convergent for s > 1. 

Then the Dirichlet series of ¢ is absolutely convergent as well, as 


which converges by the integral test if s > 2. | 


Fact 24.5.2 The series for N may also be written as ¢(s — 1). 

Proof. This follows just from writing it down, as each term in the infinite sum 

is like that of ¢ but with a different exponent after cancelling. a 
We can do even better than this to get a single formula for the series P. 


Fact 24.5.3 The series for ¢, P(s), evaluates as 


= o(n) _ (s—1) 
P(s) = a 
( ) > ns ¢(s) 
Proof. Recall that the Riemann zeta function is just the Dirichlet series for u; 
the previous fact is that the series for N is ¢(s — 1). 
Apply Theorem 24.4.3 to the series for ¢ and u. When you multiply these two 
series it should give the series for N, and we already showed it all converges. 


Sactivecalculus.org/single/sec-8-4-alternating. html 
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Substitute in the formulas to get P(s)¢(s) = ¢(s — 1) for s > 2, which suffices 
to prove the fact. | 
We can check this with Sage at any particular point if we wish. 


sum(Leuler_phi(n)/n*3 for n in [1..10000]]).n() 


1.36837198604112 


(zeta(2)/zeta(3)).n() 


1.36843277762021 


It turns out that such Euler products (and hence nice computations) show 
up quite frequently. 


Theorem 24.5.4 If >, Hn) converges absolutely and f is multiplicative, 


then So fn -11(1 £0) , £0) | oy). 


p* Pp 
n=1 
Proof. Doing this is Exercise 24.7.2. We have a proof that Moebius p’s Dirichlet 


series converges to its Euler product in the next subsection; the proof of this 
is very similar, just more general. | 


24.5.2 A missing step: Convergence of Dirichlet series 


Before we start using all these facts in the next section, we have to acknowledge 
there is a missing step thus far. Namely, we haven’t demonstrated much about 
convergence of these series or products, much less that they converge to each 
other. Although it is fun to play around, and numerical experimentation will 
convince you they are very likely, we need more to really use these tools with 
abandon. 

Our goal in this subsection is to prove for the Moebius pw function that 
its Dirichlet series converges to the Euler product. Proofs for most other such 
functions (such as the Riemann zeta function) are similar enough to leave more 
general proofs to a graduate course. 


Fact 24.5.5 For s > 1 we have 


n 1 
2-2) 
n=1 Pp Pp 

Proof. This proof follows the outline of [E.2.1, Theorem 9.3a] closely; see also 
[E.2.1, Theorem 9.2]. First we will come up with a way to write a partial 
product as a specific sum. Then we will use this to get a precise error between 
partial products and the infinite sum, and finally bound said error by something 
going to zero, the final step of which we separate out as an independent claim. 


Co 
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We will begin with the identity we already know as defining pw in Defini- 


HOH Zod tT: (a) 
bb =i 
S-  - = ][a =) 
dln p|n 


Assuming we multiply these products out through the kth prime, we get 


H(-2)- 


ni 1 1 A 1 1 
Pl p2 P1p2 P1P3 ) P1P2P3 P1P2P4 


3 (nr) 


n squarefree 
only pi|n, 1<i<k 


This certainly suggests the entire fact is true. 
Next, let’s introduce the set 


Ay = {n | n= pips? -+-p,*,e; > O} 


This is the set of all integers built out of the first k primes. Since p(n) = 0 
unless it has no higher prime powers, then in this notation the big right hand 
side sum is equal to 


H0-2)- yy Way 


i=1 Pi n squarefree ne€Ar 
only pi|n, 1<i<k 


Since the Fundamental Theorem of Arithmetic gives all these relations, I can 
replace p; with p? with no harm and write 


k 


[la-2)= yo 


4=1 ne Ar 


Our next step is to get a bound on the difference between the infinite product 
and infinite series, 


By the work we just did, this is 7,.¢4, 4 (n) This is the difference between the 
infinite sum and the partial product through the Ath prime. Further, we know 
this error is finite for any given allowable s, because it’s bounded by +¢, and 
¢ converges absolutely for s > 1 (recall the comparison test for infinite series). 


Let’s put absolute values on this error bound: 


To get a more explicit bound, we now deduce that any n ¢ Az must be n > pr, 
since n cannot have any of the first & primes as factors. Armed with this, the 
following Claim 24.5.6 will finish the proof: 


CHAPTER 24. INFINITE SUMS AND PRODUCTS 428 


The latter error ae a must go to zero as k — oo, since this is the tail 
of a convergent infinite series. That means that the partial products converge 
to the series; we know that is finite, so everything converges and we have our 


Euler product for this Dirichlet series! a 
Claim 24.5.6 With all notation as in Fact 24.5.5, we have 


n 

y lsd lsd 

ngA, n€ Ak, N>DPk 
Proof. The first inequality follows if we can put the absolute value inside the 
summation, which is an extended triangle inequality. To be careful, we should 
wish the series involved to actually converge, but we already showed this at 
the end of the proof of the main fact. 
The second inequality is due to the fact that any n ¢ A, must be bigger than 
px, 80 the set of all integers above pz, would just yield a bigger sum (since all 
terms are now positive after the first step). 
The final inequality uses that 4 = 0,1, —1 always. | 


1 
< == 
SS ns 

n>Pk 


Ln) 


n> 


Ln) 


n> 


24.6 Four Facts 


We are now ready to work with four applied facts which we can prove, using 
these tools. Some have other types of proofs, but number theory combined with 
calculus really provides a unified framework for a huge number of problems. 


e In Subsection 24.6.1, we will show that the probability that a random 
integer lattice point is ‘visible’ from the origin is Sy this is Proposi- 
tion 24.6.2. 


¢ In Subsection 24.6.2, we see that the Dirichlet series for f(n) = |u(n)| is 
¢(s)/¢(2s); this is Proposition 24.6.3. 


¢ In Subsection 24.6.4, we compute the average value of ¢(n) to be 24; this 
is Proposition 24.6.7. 


e In Subsection 24.6.3, we see that the prime harmonic series sum 


a a diverges, with p, the nth prime; this is Proposition 24.6.4. 


24.6.1 Random integer lattice points 


The following graphic will indicate what it means to have a point visible from 
the origin; is there a point directly between it and the origin or not? To 
rephrase, what is the probability that a point in the integer lattice has a line 
connecting the point to the origin that does not hit any other point? (We will 
explicitly avoid any discussion of why such infinitary probabilities are defined 
in this introductory text.) 
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Figure 24.6.1 Integer lattice points visible from the origin through n = 5 


For this example, the probability is about 0.66, but the theoretical prob- 
ability will not be two-third! We will as usual want an interactive version 
too. 


@interact 
def _(viewsize=(5,[3..25])): 

var('x,y!') 

P=Graphics() 

grid_pts = [[i,j] for i in [-viewsize..viewsize] for j 
in [-viewsize..viewsize]] 

P += points(grid_pts,rgbcolor=(0,0,0) ,pointsize=2) 

Lattice_pts = [coords for coords in grid_pts if 
(gced(coords[Q], coords[1])==1)] 

P += points(lattice_pts, rgbcolor = (0,0,1), 
pointsize=10) 

Linesegs=[lLine([[0,0],[spot[0],spot[1]]], 
rgbcolor=(1,0,0), lLinestyle="--",thickness=.5) for 
spot in lLattice_pts] 

for object in lLinesegs: 
P += object 

show(P, figsize = [5,5], xmin = -viewsize, xmax = 
viewsize, ymin = -viewsize, ymax = viewsize, 
aspect_ratio=1) 

pretty_print(html(r"Probability_in_.view_is_$\ approx. 
4s$"%( Integer(len(lattice_pts)) / 
Integer(len(grid_pts))).n())) 

pretty_print(html(r"Theoretical_probability wis. 
$1/\zeta(2)\approx.%s$"%(1/zeta(2)).n())) 


Note that the probabilities estimated by this interact vary wildly. Especially 
at a prime distance one should expect the computed probability to be higher 
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than the theoretical one; why? 

It should be pretty clear from the pictures that if x and y have a nontrivial 
common divisor d then, (4, 4) is right on the line of sight from the origin to 
(x,y) so that it is blocked off. This is most clearly so for d = gcd(a,y), so the 
following fact is the same thing as asking for the probability that two randomly 
chosen integers are relatively prime. 

Proposition 24.6.2 The chances that a random integer lattice point is visible 
from the origin is S. 

Proof. We will prove the statement about coprime random integers, or at least 
we will prove as much of it as we can without discussing infinite combinations 
of independent chances. We will also make an assumption about distribution 
of primes to simplify the proof; one can consider this a sketch, if necessary. 
First, we know that gcd(a, y) = 1 is true precisely if x and y are never simul- 
taneously congruent to zero modulo p, for any prime p. (If there were such a 
p, of course it would be a divisor; by the Fundamental Theorem of Arithmetic 
we need only consider primes.) 

For any given prime p, the chances that two integers will both be congruent to 
zero is 


3 | 
Slr 


This works because the probabilities are independent, since p is fixed, so we 
can just multiply probabilities. 
Hence the probability that at least one of x or y will not be divisible by p is 


7 a 


1 = 
Pp p Dp 


(This may remind you of the so-called birthday problem in probability.) 

Now comes our assumption. We will suppose that if p 4 q are both prime, 
then the probability that any given integer is divisible by p has nothing to do 
with whether it is divisible by g. (Such properties are not true in general; if n 
is divisible by 4 it has a 100% likelihood of being divisible by 2, while if n is 
prime, it has almost no chance of being even.) 

In such a case the probabilities are independent, so that (even in this infinitary 
case) 


_2 1 6 
[[@- 9) =] 5 = 160) = 5. 
Pp P 
We may note (as in the more extended discussion in [E.2.1, Chapter 9.4]) by 
using Fact 24.4.2 that this is also the value of the Dirichlet series of 4 evaluated 
at s = 2. | 
This implies that a random pair of integers, selected from a large enough 
bound, will be relatively prime about 61% of the time. See this Numberphile 
video® for a real-time experiment on Twitter’ doing something analogous with 
triples in order to estimate Apéry’s constant ¢(3). 


(6/pi*2).n() 


@.607927101854027 


Swww. youtube. com/watch?v=ur- iLy4z3QE 
“Sounds like an extra-credit project to me. 
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24.6.2 Dirichlet for the absolute Moebius 


Proposition 24.6.3 The Dirichlet series for |u(n)| is ¢(s)/¢(2s). 

Proof. With all the tools we’ve gained, the proof® is nearly completely symbolic 
at this point! 

First, we have the following from the definition of Moebius in Definition 23.1.1, 


or from Fact 24.5.5: 
— |(7)| ( i ) 
= b= Pe 
> ns I] p* 


P 
Next, let us write 2 = +; then we can use the basic identity (1 + 2) = 
to rewrite the right-hand side as 


I (2+ Be ae 


pe 


11, (1-5) 


Now we just invert both numerator and denominator to get familiar friends: 


My (1-#e) _ Mpt/ (1-3) 
M,(1- 3) p1/ (1- a) 


which means the sum will be ¢(s)/¢(2s). a 
Let’s try this out computationally. 


@interact 
Gla? (352,344,516 
S = sum(Labs(moebius(n))/n*s for n in [1..10000]]).n() 
S2 = zeta(RR(s))/zeta(2*RR(s)) 
pretty_print (html ("The approximation _is_$%s$_while_the. 
zeta_computation_is_$%s$."%(S,S2))) 


Computing these series doesn’t stop here, of course! For example, some- 
thing analogous can be said about the Dirichlet series for multiples f(n) |u(n)| 
for certain types of f; see [E.4.6, Exercise 11.13] for a precise statement. 


24.6.3 The prime harmonic series 


The divergence of the series created from the reciprocals of prime numbers is 
not necessarily a particularly obvious fact. Certainly it diverges much, much 
slower than the harmonic series (recalled before Definition 20.3.10), which al- 
ready diverges very slowly. Euler showed this in 1737°. 


@interact 
def _(n=[10,100,1000,10000,100000,1000000]): 
out = sum([RealField(100)(1/p) for p in 
primes_first_n(n)]) 
pretty_print(html("The.sum_of_the_reciprocals_of the. 
first _$%s$_primes_is_$\\approx.%s$"%(n, out) )) 


8This result is the first half of [E.2.1, Exercise 9.14], where it is then applied to the 
Liouville A function of Definition 23.3.4 in an interesting way. 
®scholarlycommons.pacific.edu/euler-works/72/ 
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This proof doesn’t actually use Dirichlet series, but has in common with 
them themes of convergence and estimation, so it is appropriate to include 
here. 


Proposition 24.6.4 Prime harmonic series diverges. Let p,, be the nth 
prime. Then the following series, which we call the prime harmonic series, 
diverges: 


Proof. This is a fairly expanded form of the proofs in [E.2.1, Theorem 9.2] and 
[E.4.6, Theorem 1.13], which the latter attributes to Clarkson in the Proceed- 
ings of the AMS. 
As with many other occasions to prove series divergence, we will focus on the 
‘tail’s beyond a point that keeps getting further out. In this case, we’ll choose 
the ‘tail’ beyond the first & primes, 
1 

ji —. 
D Pn 
By examining certain terms in this, and assuming (falsely) that it can be made 
finite, we will obtain a contradiction. 
First, let’s consider numbers of the form 


P1p2p3°°*PRr +1 =prH-rt+1 


(Recall the primorial notation from Definition 22.2.7.) Such a number cannot 
be divisible by any of those first k primes, so by the Fundamental Theorem of 
Arithmetic any number like p,#-r +1 may be factored as 


PniPn2*** Pres 


where all n; > & (some may be repeated). 
Return to the ‘tail’. Since this p,#-r+1 factors with @ factors, then somewhere 
in the ¢th power of the ‘tail’ we have the following term: 


L 
1 1 
Te = = fee, 
(= i P1p2p3°+* per +1 


n>k 


Now assume that in fact the prime harmonic series converges; we will derive a 
contradiction. 

First, for some k, the ‘tail’ T is less than 5, Lé. T= ak oa 5. Since each 
term is positive, 7 > 0 and a geometric series involving the th powers of T is 
very precisely bounded: 


eo _ 
ocyom-S\(T2) <p ena 

é=1 é=1 \n>k 
Now we bring in the first discussion in this proof; every single term of the form 
a eet will appear somewhere within this sum of the @th powers, though 
naturally @ in each case will depend heavily upon r. 


So the series of reciprocals of just these special terms is bounded. 


co 


Sceptre se (EA) <2 


nay P1P2P3 ° t=1 \nsk PP? 
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A bounded series of all positive number should converge (e.g. by comparison). 


Here comes the contradiction. The same series is bounded below as follows, 
for each integer k. 


: 


— 1 
oe 
Pip2p3::*Per+ 1 4~ pipeps:+: PET + pip2p3°** Pr 


r=1 
Pip2p3*** Pra r+ 


This series certainly diverges, as a multiple of the tail of the harmonic series! 

Since no matter how big k is (and hence how far out in the ‘tail’ we go) we report 
that a certain series both converges and diverges, we have a contradiction. 
Hence our original assumption that we could choose k to make T finite was 
false, and the prime harmonic series must diverge! a 


24.6.4 The average value of Euler phi 


Finally, here is a really nice result to end with. Thinking about the average 
value of @ will put together many themes from this text. 

You may recall Section 20.5, and in particular Exercise 20.6.17, where you 
were asked to conjecture regarding this question. As there, it’s useful here to 
try to graph the average values first; here I have incuded the correct long-term 
average as well. 


304 
254 
20 4 
154 


104 


Figure 24.6.5 Average value of ¢ versus Sax 


Before formally proving this, let’s look at a significant picture for concep- 
tualizing the proof. This is similar to what we used for the average of 7 and a 
in Chapter 20. 
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Figure 24.6.6 Integer lattice with labeled points 


The text at each lattice point is the value of horizontal coordinate, multi- 
plied by a factor of Moebius of the vertical coordinate. You can try it interac- 
tively if you are online. 


@interact 
def _(n=(6, list(range(2,50)))): 
viewsize=nt1 
g(x)=1/x 
P=Graphics() 
P += plot(n*g,(x,@,n+1)) 
grid_pts = [[Li,j] for i in [1..viewsize] for j in 
[1..viewsize]] 
P += points(grid_pts,rgbcolor=(0,0,0) ,pointsize=2) 
Lattice_pts = [coords for coords in grid_pts if 
(coords[@]*coords[1]<=n) ] 
for thing in lattice_pts: 
P += text(moebius(thing[1])*thing[0], 
thing ,rgbcolor=(@,0,0)) 
show(P, ymax=viewsize , aspect_ratio=1) 


We will crucially use two earlier facts in the proof: 


e From above (e.g. Fact 24.4.2), 


SS u(n) 1 6 
D n2 ~ (2) or? 


n=1 


e From the previous chapter (e.g. Fact 23.3.2), 


o=p+N = 6(n) = Youd), 


d|n 
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This proof is based loosely on [E.4.6, Theorem 3.7]. See [E.2.8, Theorem 3.8.1] 
for a more detailed approach which is rewarded with a very nice error estimate 
— unusual in that it starts its discussion of averages with this example! Both 
books have much more related material, including useful (if difficult) exercises 
such as finding a bound for the sum of reciprocals of @. 


Proposition 24.6.7 The long-term average value of @ is given by 
noo 3n 
7 


Proof. Consider the summation function for ¢, 7;_, ¢(n). As in Chapter 20, 
we will think of it as summing things up in two different ways. 
In particular, look at the summation once we have replaced with the Moebius 


sum inside: a a 
S- o(k) = $2 So ula) 
k=1 


k=1 d|k 


=] 


Ql x 


Now instead of considering it as a sum over divisors for each k, we can think of 
it as summing for each divisor over the various hyperbolas xy = k. This yields 


n La] 


Salas = So ula | Sos 
d=1 


k=1 dk k=1 


Now let’s examine the terms of this sum. We will several times use Landau 
notation as in Definition 20.1.2. 
Knowing about the sum of the first few consecutive integers (also used at the 
end of Subsection 20.3.2), we see that 

Lad 


Sr) =3(8)°+0(8). 


If we plug that in the double sum, we get 


Let’s examine this. 


¢ The first term goes to % as n — oo. Further, its error is O(1/n), because 
pln) /n? <1fn? and fa? de =—2". 


e The second term is certainly O(n log(n)), since it is n times a sum which 
is less than something O(log(n)) (namely, ¢). 


Plugging everything in, we get that 


“ 1,6 
> o(n) = Sar + O( stuff less than n”) 
k=1 


Dividing by n and taking the limit, we get the asymptotic result. 


lim ———_ Ba 
n—- oo galt 
ae, + 4O(stuff less than n?) 


nN— oo 3n 
Tv 
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3 +n + O( stuff less than n) 7 


24.7 Exercises 


1. Write down your answers to the three questions about the definition of 
Dirichlet series after Definition 24.3.1. 

2. Prove Theorem 24.5.4 in full generality, following the proof of Fact 24.5.5. 
(This is a good technical exercise in convergence.) 


3. Look up, or prove from scratch, that the ‘alternating harmonic series’ 


“4)k+1 

ae ( > is convergent, but not absolutely convergent. Look up, or 
prove from scratch, the value of this series; then find a rearrangement of 
it that sums to precisely half the usual value. (Extra credit if you do so 
without referencing anything connected to the university IUPUI.) 


Exercise Group. The sum of the reciprocals of all primes is a very nuanced 
thing; here are some additional exercises about it. 


4. Learn more about the notion of zero density (recall Subsec- 
tion 22.2.2). Then find other (ordered) subsets of the positive integers 
like P = { primes } such that the sum of the reciprocals of the set di- 
verges, but the set has zero density in the integers. 


5. Use Sage or other computational tools to conjecture the rate of growth 


of the function : 
f(z) = cD p 


pKa 


where p is of course prime. Hint: Typically one needs lumber to print 
a book, such as [E.4.5] (but don’t peek there until you’re really stuck!). 
6. Recall w from Definition 23.3.3 and f(a) from the previous question. 
Confirm numerically that the average value to x (in the sense of Chap- 


ter 20 ) of w is about the same as the size of f(x). Give a reason why 
Doce : should be related to >7,,<, (7). 


7. Find an exercise about averages of arithmetic functions, Dirichlet series, 
or Euler products in [E.4.6, Chapters 3 and 11] and create a Sage cell to 
verify the result computationally. Then do the actual exercise, and report 
back comparing the two experiences. 


8. Following [E.7.35], let a point r, s be b-visible from the origin (b a positive 
integer) if it lies on the graph of some y = ax? for a € Q and there 
is no other lattice point between that point and the origin on the curve. 

1 


Theorem 1 of their paper is that the proportion of b-visible points is C(b+1)" 


Verify this experimentally using graph paper or a computer for b = 2. 
Exercise Group. The following exercises follow naturally from Exercise 23.5.11, 
and introduce another famous number-theoretic function defined via series. 

9. Continuing with the idea in Exercise 24.7.3, make the Dirichlet sum 
n—-1 
Pe cur Show this should converge absolutely if s > 1. 


n=1 


10. We call the function 7(s) = S>°~ (-)""™ the Dirichlet eta function. 


n=1 ns 


Prove that 7(s) = (1 — 2'~*) ¢(s) in one (or all) of four ways: 


e Use Exercise 23.5.11 and methods of this chapter. 
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e Use manipulations as in Section 24.6. 


¢ Manipulate ¢(s) itself by seeing what happens if you subtract 
the even terms a couple times. 


e Use Exercise 18.3.5 to come up with an Euler product for 7. 


Summary: Infinite Sums and Products 


Our penultimate chapter asks what happens if we take our formulas for 
arithmetic functions and add infinity to the mix. 


1. 


The first section, Section 24.1, examines the connection between products 
and sums for arithmetic functions. 


. Then we define the Riemann zeta function and examine some of its basic 


properties. 


What happens more generally when we go to infinity? We get Dirichlet 
Series and Euler Products. 


The next section examines multiplication of these infinite series and prod- 
ucts in Theorem 24.4.3. 


We then investigate how these infinite processes work with the ¢ function, 
as well as show technical details of convergence in Fact 24.5.5. 


In the final section we can now prove Four Facts of high interest, including 
my favorite, Proposition 24.6.2. 


The Exercises begin winding down, as we give more conceptual activities. 
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Chapter 25 


Further Up and Further In 


If you survived this book, hooray! You made it. You did a great job making it 
through a whole arc of number theory accessible at the undergraduate level. 

Although we really did see a lot of the problems out there, there are many 
we did not see all the way through. We were able to prove some things about 
them. Here are just a few problems we started touching on. 


¢ Solving higher-degree polynomial congruences, like x? = a (mod n). (Chap- 
ter 7) 


e Knowing how to find the first nontrivial integer point on hard things like 
the Pell (hyperbola) equation x? — ny? = 1. (Chapter 15) 


e Writing a number not just in terms of a sum of squares, but a sum of 
cubes, or a sum like x? + 7y?. (Chapter 14) 


e The Prime Number Theorem, and finding ever better approximations to 
a(x). (Chapter 21) 


It’s this last one we will focus on in this extended postscript, for it takes 
us to the very frontiers of the deepest questions about numbers. 


25.1 Taking the PNT Further 


Recall Gauss’ approximating function for (x), the logarithmic integral func- 
tion (Definition 21.2.2). Let’s remind ourselves just how well it performs. 


439 
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150° 


100 4 


50 4 


200 400 600 800 1000 


Figure 25.1.1 Prime counting function along with Gauss and another 


As we can see, it wasn’t too bad of an estimate. But, as mathematicians, 
we hope we could get a little closer. At the end of Subsection 21.3.1 we tried 
(among several other things) the fairly weird amended function 


1 


Li(x) — 5Li(y2). 


This was indeed a better approximation (in red in the graphic above). You 
can try it interactively below. 


@interact 
def _(n=(1000,(1000,10%6))): 

P = plot(prime_pi ,n-100@,n, color='black', 
legend_label=r'$\pi(x)$!') 

P += plot(Li,n-100@,n, color='green', 
legend_lLabel='$Li(x)$') 

P += plot(lambda x: Li(x) - .5*Li(sqrt(x)), n-1000,n, 
color='red', 
legend_label=r'$Li(x)-\frac{1}{2}Li(\sqrt{x})$!') 

show (P) 


This second estimate seems better. One might think one could keep adding 


and subtracting 
1 


~Ts(pl/n 
7 Liz ) 
to get even closer, with this start to the pattern. 

As it turns out, that is not quite the right pattern. In fact, the minus 
sign comes from p(2), not from alternating powers of —1. You may try it 
interactively below: 


@interact 
def _(n=(1000,(1000,10*6)),k=(3,[1..10])): 

P = plot(prime_pi ,n-1000,n, color='black', 

legend_label=r'$\pi(x)$') 

P += plot(Li,n-100@,n, color='green', 
legend_lLabel='$Li(x)$') 
lambda x: sum([Li(x*(1/j))*moebius(j)/j for j in 
[1..k]]) 


mn 
iT] 
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P += plot(lambda x: Li(x) - .5*Li(sqrt(x)),n-1000,n, 
color='red', 
legend_label=r'$Li(x)-\frac{1}{2}Li(\sqrt{x})$') 

P += plot(F,n-1000,n, color='blue', 
legend_lLabel=r'$\sum_{j=1}*{%s}_\frac{\mu(j) }{j}u 
Li(x*{1/j})$'%k) 

show (P) 


From anything one can see in the preceding interact, this set of approx- 


imations doesn’t seem to add any of accuracy beyond k = 3. In fact, at 
x = 1000000, taking the approximation with the sum see HG) F(a1/9) is es- 


sentially the same as going all the way to infinity in )7>", HG) Fi(@l/ J). More 
importantly, both of these are clearly not integers, so this type of analysis 
alone will not yield a computable, exact formula for 7(x). So here are some 
questions we might raise. 


e Where does the Moebius pz in that approximation come from anyway? 


e Since this wasn’t enough, what else is involved in the error 


In (w) — Li(w)|? 


e Are there connections with things other than just (x)? 


e What does this have to do with winning a million dollars? 


25.2 Improving the PNT 


The following Table 25.2.1 shows the errors in Gauss’ and our new estimate 
for every hundred thousand up to a million. Clearly Gauss is not exact (recall 
Figure 21.2.4), but the other error is not always perfect either. 


Table 25.2.1 Errors between z(xz), the log integral, and a Mébius 
estimate 


i mi) Gi) — Li) (i) — Do SPL i(e”) 
100000 9592 —36.71 3.882 
500000 41538 —67.50 7.087 
1000000 78498 = —129.0 —31.00 


We can build an interactive table of some results if we are online. 


@interact 
def _(k=(3,[2..11])): 

F = Lambda x: sum([Li(x*(1/j))*moebius(j)/j for j in 
[1..k]]) 

WS (EE SSS fe USN oat Cass) | FSCS fe Uae Cay SlLat Can a5 
r'$\pi(i)-\sum_{j=1}*{%s}_\frac{\mu(j) }{j}o 
LiCx*{1/j}) $'%kI] 

for i in [100000 ,200000..1000000]: 

T.append(Li, prime_pi(i), Li(i).n(digits=7), 
(prime_pi(i)-Li(i)).n(digits=4), 
(prime_pi(i)-F(i)).n(digits=4) ]) 

pretty_print(html(table(T,header_row = True, frame = 

True))) 
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After the Prime Number Theorem was proved, mathematicians wanted 
to get a better handle on the remaining error between the log integral and 
n(x). In particular, the Swedish mathematician Helge Von Koch! made a very 


interesting contribution in 1901. 


Conjecture 25.2.2 The (absolute value of the) error in the PNT is less than 


—Vilog(). 


This seems to work, broadly speaking. You can try it interactively after 


the static graphic. 


3204 ~7(2) ae 
—Li(zx) a 
‘-Von Koch error estimate i 


1000 1200 1400 1600 1800 2000 


Figure 25.2.3 Von Koch estimate of error in prime number theorem 


@interact 
def _(n=(5000,(1000,10*6))): 
P = plot(prime_pi ,n-100@,n, color='black', 
legend_label=r'$\pi(x)$') 
P += plot(Li,n-1000,n, color='green', 
legend_lLabel='$Li(x)$') 
P += plot( lambda x: Li(x) - 
1/(8*pi)*xsqrt(x)*xlog(x) ,n-1000,n, 


error estimate") 

P += plot(lambda x: Li(x) + 
1/(8*pi)*xsqrt(x)xlog(x) ,n-1000,n, 
color='blue',linestyle='--') 

show (P) 


color='blue',linestyle='--', lLegend_lLabel="Von_Koch,, 


Given the observed data, the conjecture seems plausible, if not even open 
to improvement. Though we should remember that Li and 7 switch places 
infinitely often, see Fact 21.2.6! Of course, a conjecture is not a theorem, but 


luckily Von Koch had one of those as well. 
Theorem 25.2.4 The truth of the error estimate 


In(2) — Li(a)| < <-VElog(2) 


lwww-history.mcs.st-andrews.ac.uk/Biographies/Koch.htmL 
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for the prime number theorem is equivalent to saying that ¢(s) equals zero pre- 
cisely where Riemann thought it would be zero in 1859 (see Conjecture 25.3.7). 

This may seem like an odd statement. After all, ¢ is just about reciprocals 
of all numbers, and can’t directly measure primes. (And what do I mean by 
“thought it would be”?) But in fact, the original proofs of the PNT also used 
the ¢ function in essential ways. So Von Koch was just formalizing the exact 
estimate it could give us for the error. 


25.3 Toward the Riemann Hypothesis 


Riemann, though, was after bigger fish. He didn’t just want an error term. He 
wanted an exact formula for 7(a), one that could be computed. Computed by 
hand, or by machine, if such a machine came along, as close as one pleased. And 
this is where ¢(s) becomes important, because of the Euler product formula: 


oe 1 


1 
y 2 -Ts 
n=1 


Pp 


Somehow ¢ does encode everything we want to know about prime numbers. 
And Riemann’s paper, “On the Number of Primes Less Than a Given Magni- 
tude?”, is the place where this magic really does happen. (The paper is also 
available in translation in the appendix of [E.4.4].) Seeing just how it happens 
is our goal to close the book. 

We'll begin by plotting ¢, to see what’s going on. As you can see, ¢(s) 
doesn’t seem to hit zero very often. Maybe for negative s ... 


105 


—— —— T T 
-10 -5 \ 5 10 


Figure 25.3.1 The zeta function on [-10,10] 
(plot (zeta, -10,10, ymax=10, ymin=-1)) 


25.3.1 Zeta beyond the series 


Wait a minute! What was that plot? Shouldn’t ¢ diverge if you put negative 
numbers in for s? (Recall our definition in Definition 24.2.1.) After all, then 


2www.cLaymath.org/publications/riemanns-1859-manuscript 
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for s = —1 we'd get things like 
d” 
i=1 


and somehow I don’t think that converges. 
But it turns out that we can evaluate ¢(s) for nearly any complex number 
S we desire. 


Figure 25.3.2 Zeta on the complex plane 
(graphics_array([complex_plot(zeta, (-20, 20), 
(-20,20)),complex_plot(Lambda z: z, (-3,3),(-3,3))])) 


The right-hand graphic gives a color to every point in the complex plane. 
The left-hand graphic then color-codes the outputs of ¢ at each point in the 
plane by matching them to the appropriate color (as a complex number) for 
the output. 

The important point here isn’t the picture itself, but that there is a picture. 
To wit, ¢ can be defined for (nearly) any complex number as input. Why 
would that be the case? One way to see that we could define this function 
for complex values comes by trying to define each term a ics) = o 
more precisely. 

Suppose we let s be a complex number, using the long-standing notational 
convention 


s=ot+it 
Then we can rewrite this term as 


1 


- —s _ eS los(n) = e7 (ott) log(n) _ e 7 log(n) .— tt log(n) 
n 


Now we use a fact you may remember from calculus, which is very easy to 
prove with Taylor series. (See Exercise 25.9.1): 
e'” = cos(zx) + isin(z) 
Applying this, we get 
iz 


= e~ 7 lo8(n) e~ttlog(n) — n-° (cos(tlog(n)) — isin(tlog(n))) 
n> 
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Using this analysis, if 0 > 1, since cos and sin always have absolute value 
less than or equal to one, we still have the same convergence properties as with 
regular series. So if we take the imaginary and real parts separately, we can 


rewrite 
C(s) = 3 1 = 3 cos(t log(n)) “5 2 sin(t log(n)) 


ns 


n=1 n=1 n=1 
That doesn’t explain the part of the complex plane to the left of g = 1 of 
the picture above. All I will say is that it is possible to extend ¢ there, and 
Riemann did it. (In fact, Riemann is largely responsible for advanced complex 


analysis.) As an example, ¢(—1) = -+ which is very close to saying that 


1 
12° 


zeta(-1) 


-1/12 


¢(-1) =14243444+5+4+6+748+4+9+4104-:-= 


Investigate further whether this has any meaning in Exercise 25.9.2°. 


25.3.2 Zeta on some lines 


Let’s get a sense for what the ¢ function looks like. First, observe a three- 
dimensional plot of its absolute value for 0 between 0 and 1 (which will turn out 

to be all that is important for our purposes). The code for this is pLot3d( Lambda 

x,y: abs(zeta(xtixy)),(@,1),(-20,20), plot_points=100) + plot3d(@, (@,1), 
(-20,20), color='green', alpha=.5). 


5.0 
2.5 


0.0 


0.00 20.0 


1.00 -20.0 


Figure 25.3.3 3d plot of Riemann zeta 


3You may wish to view some dueling videos on this topic at Numberphile, a rebuttal, or 
another excellent attempt — all on youtube.com on searching for ‘zeta negative one’. 
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To get a better idea of what is happening, we next compare two different 
plots (first static, then interactive). One is a one-dimensional plot of |¢| for 
different inputs with the same o. On the other side is the two-dimensional 
colored complex plot of ¢(o + it), where a is the real part, chosen by you, and 
then we plot t out as far as requested. The line which we are viewing on the 
complex plane in the first graphic is dashed in the second one. 


T T T T r T T T 
-40 -30 -20 -10 10 20 30 40 


Figure 25.3.4 Two different 2d plots of Riemann zeta 


var('t') 

@interact 

def _(sig=slider(.01, .99, .01, 0.5, lLabel=r'\(\sigma\)'), 
end=slider(2,100,1,40, label=r'end_ofi\(t\)')): 

p = plot(lambda t: abs(zeta(sigttxi)), -end,end, 
rgbcolor=hue (@.7) , ymin=@) 

q = complex_plot(zeta,(0,.99),(-end,end), 
aspect_ratio=1/end) + line([(sig,-end) ,(sig,end)], 
Llinestyle='--') 

show(graphics_array([p,q]),figsize=[5,3]) 


You'll notice that the only places the function has absolute value zero 
(which means the only places it hits zero) are when o = 1/2. 


Remark 25.3.5 It is not really possible to fully visualize a complex function 
of complex input. So we often pick some line in the complex plane, such as 
where the real part equals 1 (sort of like « = 1) or where the imaginary part 
equals 1 (sort of like y = 1); then we either treat this as input to a parametric 
curve, or similarly look at the output and in one way or another reduce it to 
one real number, and plot it in the plane. 

Another way to visualize ¢ in a useful way is with the parametric graph 
of each vertical line in the complex plane as mapped to the complex plane. 
You can think of this as where an infinitely thin slice of the complex plane is 
“wrapped” to. 
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Figure 25.3.6 Plotting a line of Riemann zeta 


This image is reasonably famous, because the only time the curve seems to 
hit the origin at all is precisely at o = 1/2, and at o = 1/2 the curve seems to 
hit the origin lots of times. For any other o the curve just misses the origin, 
somehow, which I highly encourage you to try interactively below. 


@interact 
def _(sig=slider(.01, .99, .01, 0.5, lLabel=r'\(\sigma\)')): 
end=30 

p = parametric_plot((lambda t: zeta(sigtt*i).real(), 
lambda t: zeta(sig+t*i).imag()), (@,end), 
rgbcolor=hue (0.7), plot_points=300) 

q = complex_plot(zeta,(0,.99),(@,end), 
aspect_ratio=1/end) + line([(sig,®),(sig,end)], 
Linestyle='--') 

show(graphics_array([p,q]), figsize=[5,3]) 


Now it’s true that ¢ is also zero at negative even integer input, but these 
are well understood. The pictures demonstrate the mysterious part. And so 
we have the following crucial question — where is ¢(s) = 0? 


Conjecture 25.3.7 Riemann Hypothesis. All the zeros of ¢(s) = ¢(a +it) 
where t #0 are ono = 1/2. 

The importance of this problem is evidenced by it having been selected as 
one of the seven Millennium Prize* problems by the Clay Math Institute (each 
holding a million-dollar award), as well as having many recent popular books 
devoted to it®. My feeling is that any number theory course should at least 
briefly give a taste of its significance, even though the full scope is beyond any 
first course. 


25.4 Connecting to the Primes 


The last few sections of this final chapter are devoted to seeing why the Rie- 
mann Hypothesis might be related to the distribution of prime numbers. In 
this, we will loosely follow the very interesting exposition of Prime Obsession 
by John Derbyshire, [E.4.1]. 

For motivation, think of Von Koch’s result Theorem 25.2.4 connecting the 
RH to a bound on the error between (x) and the log integral. Our goal is 
more detailed, however. 


4cLaymath. org/millennium/ 
5>Two aimed at starting from scratch for students, not just a general reader, are [E.4.2] 
and [E.4.3}. 
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We'll pursue this connection in three steps. 
1. Our first step is to see the connection between m(x) and ju(n) (25.4.1). 
2. Then we’ll see the connection between these and ¢ (25.5). 


3. Finally, we’ll see how the zeros of ¢ come into play (25.6). 


25.4.1 Connecting to Moebius 


Let’s begin by defining a new function. Here is its graph. 


5 10 15 20 
Figure 25.4.1 The J function on two ranges 


If you are reading this online, evaluate the following cell to define it, as well 
as to plot a bit of it in any range you prefer. 


def J(x): 
end = floor(log(x)/log(2)) 
out = 0 


for j in [1..end]: 
out += 1/jxprime_pi(x*(1/j)) 
return out 


@interact 

def _(end=[20,40..2000]): 
L1 = EC(n,J(n)) for n in [1..end]] 
plot_step_function(L1).show() 


Riemann called this function f. Following [E.4.4] and [E.4.1], we will call 
it J(x). It is very similar to (2) in its definition, so it’s not surprising that it 
looks similar. 

Definition 25.4.2 We define 


J(z) = 7(a) + at va) + ana) + (7) tee “n (2) 


n=1 
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This looks like it’s an infinite sum, but for any given 2, it is finite. For 
instance, let’s calculate J(20): 


7 
12 


F(20) = (20) + ba(V20) + £n(920) + 4m(920) =8+ 2 +444 =9 


1 

3 4 
because v/20 © 1.8 and 7(¥/20) ~ (1.8) = 0, so the sum ends there, and we 
can see that on the graph. 

Okay, so we have this new function. Yet another arithmetic function. So 
what? 

Ah, but what have we been doing to all our arithmetic functions to see 
what they can do, to get formulas for them? We’ve been Moebius inverting 
them, naturally! (Recall Section 23.2.) In this case, Moebius inversion could 
be really great, since it would give us information about the thing being added, 
which is the all-important 7(2). 

The only thing standing in our way is that 


J(x) = 3 «n (xin) 


n=1 


is not a sum over divisors. But it turns out that, just like when we took the 
limits of the sum over divisors Sana 3 we got yal 1 =» we can do the same 
thing with Moebius inversion. 


Fact 25.4.3 If 0, f(x/n) and SY, g(a/n) both converge absolutely, then 


= Do Slen) = He) =o ala) g(a/N). 


n=1 
We can use this by a g = J with f(x/n) = 47 (x/"). Applying this, 
we achieve a very important result writing a(x) in terms of J: 


= Te) 1 1 eee 1 
= un) = Ia) SIV) IS 8) — IVa) + IY) + 
n=1 
Remark 25.4.4 This is the usual argument, but note that f(x/n) = 47 (x1/") 
is really a function of x and n, not just «/n. The inversion can be justified 
somewhat differently via a footnote in Edwards’ discussion of this matter in 
[E.4.4], but it’s worth noting this hurdle can be overcome more directly®; see 
Exercise 25.9.6 for one direction. 


25.5 Connecting to Zeta 


25.5.1 Turning the golden key 


Now, this looks just as hopeless as before. How is J going to help us calculate 
m, if we can only calculate J in terms of 7 anyway? 

Here is where Riemann “turns the Golden Key”, as Derbyshire puts it. 
Because ¢ has an Euler product over the set of primes, we can just possibly 
connect it to each prime. It turns out this will in fact connect ¢ to J. This is 
the goal of the rest of the current section. 

In the next section, we will see how the zeros of ¢ give us an exact formula 
for J; then we will finally plug J back into the Moebius-inverted formula for 


6Thanks to Zach Teitler for this point. 
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m to get an exact formula for 7 in Section 25.7. Here is a plot of that formula, 
as a foretaste. 


—7(x) 


28 4 ~ Liz) 
— "9 piel) 
rate) 
264 —Really good estimate 


245 


225 


205 


18 4 


16 


50 60 70 80 90 100 


Figure 25.5.1 a(x), Li(x), and something better than Li 


We can see above that this has the potential to be a very good approxi- 
mation, even given that I did limited calculations here. The most interesting 
thing is the gentle waves you should see; this is quite different from the other 
types of approximations we had, and seems to have the potential to mimic 
the more abrupt nature of the actual a() function much better in the long 
run. (See [E.4.3] for more details along these lines, connecting to Fourier series, 
which we will not pursue.) 


25.5.2 Detailing the connections 

Now let’s connect J and ¢. Recall the Euler product for ¢ again: 
1 

c(s)=[] =a 


Pp 


The trick to getting information about primes out of this, as well as con- 
necting to J, is to take the logarithm of the whole thing. This will turn the 
product into a sum, something we can work with much more easily’: 


log(¢(s)) = S “log (<==) a S- — log (1 —p-’) 


Adding just fractions would have perhaps allowed using a geometric series 
to make this a sum, but what could we do with a sum of logarithms? 


Question 25.5.2 What can we do with — log() of some sum, not a product? 


‘This reminds me of the old joke about Noah’s ark and logarithms. So, after the ark 
lands, all the animals are .. having baby animals, let’s say. Except the snakes. No baby 
snakes. Noah asks what the problem is — they seem to be missing the point. Snakes say, no 
worries, just give us a wooden bench or sawhorse or something. Noah wonders what’s up, 
but gives it to them. Next morning, tons of baby snakes! Naturally Noah has to ask where 
the magic was. “Simple; adders need a log table to multiply.” 
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Solution. We can use its Taylor series! 


—] =e ae 
og(1— 2) y i 
k=1 
So we plug it in: 
log(¢ 
p k=l 


Now we will manipulate this in two big steps. First we’ll rewrite the fraction 
as an integral, and then we will try to somehow add up the integrals. 

Standard improper integral work (Exercise 25.9.3) from second-semester 
calculus shows that we can rewrite the summands: 


(p_*)* =f —s-1 
— ed S d. . 
k kj»? 


That means we can rewrite the logarithm of ¢ as 


log(¢ 
p k=l 
D2. tf? oo wosé lr ie 1 da. 
Pp p k=l 


This is a very large sum of integrals. We can rewrite this as a single integral, 
but we will need to pay close attention. 

First, we can unify all these integrals from p* to oo by making them all 
have the same endpoints. This is done somewhat artificially, by writing 


k 
CO Poy co 4 
/ jr a — / - 0-278! dr +/ pre 
pk 1 pk 


This yields the integral of a piecewise-defined function, but it for every k and 
p it is defined from 1 to oo. 

Now comes the most surprising part. What function would I get if I added 
up all those integrals in the double sum for log(¢(s)), 0, an er “Y? To see 
this, let us add up all of the piecewise integrands, organizing by the powers k 
for any given prime p. 


1. Whenever x reaches p' = p, the sum of all those functions would add 
ne Adding up all of these for all p means the total function would 
include 
aC ieee 
2. Whenever « reaches p?, the sum of all those functions would add $ grt, 
This, however, is the same thing as when \/z hits a prime, so we cnt add 


it to the previous point. The total function would include would include 


1 


at(va)a bes 


3. When z reaches a cube of a prime, the sum adds ao This is the 
same thing as adding a new part when /z hits a prime, that is adding 


1, 3 —s—l1 
gt va)a 
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And so forth for each k. In short, adding up all these piecewise integrands 
seems to give a big integrand 


(me) - at v3) + an a2) +- ) gr, 


s—l 


But this sum of all the piecewise integrands is J(x), multiplied by «— 
Hence 


log(¢(s)) =s)~ 3 i, rode = 7. Ie ae, 


p k=l 
This completes our connection of ¢ and J. 
25.6 Connecting to Zeros 


25.6.1 Where are the zeros? 


Our next goal is to see how this connection 


log(¢(s)) = sf J(x)a~*~ "dx 


relates to the zeros of the ¢ function (and hence the Riemann Hypothesis). 


L = lcalc.zeros_in_interval(10,100,0.1) 
[L[@] for l in L] 


[14.1347251, 21.0220396, ..., 98.8311942] 


We see all the zeros for ¢ = 1/2 between 0 and 100; there are 29 of them. 

We will connect to ¢ by means of a very powerful analogy, the one which 
Euler used to prove ¢(2) = m (see the end of Subsection 20.4.2) and which, 
correctly done, does yield the right answer. 

Begin the analogy by recalling basic algebra. The Fundamental Theorem 
of Algebra states that every polynomial factors over the complex numbers. For 
instance, 

f(x) = 5a? — 5a = 5(2 — 0)(a — 1)(a + 1). 


If we take the logarithm of such a factorization, we can say things like 
log(f(x)) = log(5) + log(a — 0) + log(x — 1) + log(a# + 1) 


Then if it turned out that log(f(a)) was useful to us for some other reason R, 
it would be reasonable to say that we can get information about the otherwise- 
mysterious R from adding up information about the zeros of f (and the con- 
stant 5), because of the addition of log(# — r) for all the roots r. 

You can’t really do this with arbitrary functions, of course. Disappointingly, 
¢ is definitely a function where this doesn’t work, mostly because ¢(1) diverges 
so badly, no matter how you define the complex version of ¢. 

But it so happens that ¢ is very close to a function you can analyze this 
way, (s — 1)¢(s). Applying the logarithm factoring idea to (s — 1)¢(s) (and 
doing lots of relatively hard complex integrals, or some other formal business 
with difficult convergence considerations) allows us to essentially invert the 
equation 


log(¢(s)) = 77 J(x)a~*— da 
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to this even more surprising formula (whose notation we explain below): 


25.6.2 Analyzing the connection 


It is hard to overestimate the importance of the formula (25.6.1). Each piece 
comes from something inside ¢ itself, inverted in this special way. 


¢ First, Li(x) is still the log integral, and comes from the fact that we 
needed (s — 1)¢(s) to apply this inversion, not just ¢(s). In fact, this 
particular inversion can be seen by integrating, as it’s true that 


| Li(z)x~*—1dz = —log(s — 1) 
1 


so one can see that s — 1 and Li seem to correspond. 


e Second, each Li(x?) comes from each of the zeros p of ¢ on the line 
o = 1/2 in the complex plane. This is the part which most closely 
corresponds to the factoring. 


e The constant term log(2) comes from the constant when you do the 
factoring, similarly to the 5 in the example above using f(x) = 52° — 52. 


e Finally, the integral in (25.6.1) comes from the zeros of ¢ at —2n we 
mentioned just before the statement of 25.3.7. 


To give you a sense of how complicated (25.6.1) really is, here is a plot of 
just one small piece of it. 
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Figure 25.6.1 Plot of Li(20!/2+*) 


This is the plot of Li(20'/?+") up through the first zero of ¢ above the real 
axis. It’s beautiful, but also forbidding. After all, if takes that much twisting 
and turning to get to Li of the first zero, what is in store if we have to add up 
over all infinitely many of them to calculate J(20)? 

So at the very least, it would be helpful to know where all of those myste- 
rious zeros live! This is why the Riemann Hypothesis is so important; it pins 
them down quite dramatically. 


25.7 The Riemann Explicit Formula 


Now we are finally ready to see Riemann’s result, by plugging in the formula 
(25.6.1) for J into the Moebius inverted formula for 7 we saw just before 
Remark 25.4.4: 


1 1 1 
5 


Ia) + Ia) +o 

It is true that Riemann did not prove the following formula fully rigorously, 
and indeed one of the provers of the Prime Number Theorem mentioned taking 
decades as part of that effort just to prove all the statements Riemann made 
in this one paper. Nonetheless, it is certainly Riemann’s formula for 7(a), and 
an amazing one: 


Fact 25.7.1 Riemann explicit formula. 
_ = h(n) “7 1/n “p/n 7 Blt - dt 
T(x) = S- —— | Li(a’/") — S- Li(a?!™) + Lia?) ) + 


n r gi/n t(t? — 1) log(t) 


n=1 
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It is worth making two points about the transition to this formula. First, 
if you’re wondering where the log(2) from (25.6.1) went, it went to 0 because 
Swe u(r) = 0, though this is very hard to prove. (In fact, it is a consequence 
of the Prime Number Theorem; see Exercise 25.9.5.) 

Secondly, each p is a zero above the real axis, and then p is the correspond- 
ing one below the real axis. The summation is over every single zero not on the 
real axis. In particular, these p are conjectured by the Riemann Hypothesis to 
all have real part equal to 1/2, which would make things particularly tidy. 

Now let’s see this formula in action. 


—7(x) 


284 —Li(z) 
3 
— DO Life) 
j=1 
264 —Really good estimate 
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Figure 25.7.2 7(x), Li(x), and something better than Li 


This graphic shows just how good it can get. Again, notice the waviness, 
which allows it to approximate 7(x) not just once per “step” of the function, 
but along the steps. Try it out interactively below (where we make it somewhat 
less accurate for the sake of computational speed). 


import mpmath 

var('ty') 

L = lcalc.zeros_in_interval(10,50,0.1) 

@interact 

def _(n=(100,(60,10%3))): 
P = plot(prime_pi,n-50,n, color='black', 

legend_label=r'$\pi(x)$') 

P += plot(Li,n-50,n, color='green', 

legend_lLabel='$Li(x)$') 

Lambda x: sum([mpmath.li(x*(1/j)) * moebius(j)/j for 

J wt (ils. Sd) 

P += plot(G,n-50,n, color='red', lLegend_label = 
r'$\sum_{j=1}*{%s}_\frac{\mu(j) }{j} ULI Cx*{1/j }) $'%3) 

F = lambda x: sum([(mpmath.li(x*(1/j))-log(2) + 
numerical_integral( 1/(y*(y*2-1)xlog(y)), 
x*(1/j),00)[0] )*moebius(j)/j for j in [1..3]]) - 
sum(C(mpmath.ei(log(x)*((@.5+LDL@]*i)/j)) + 
mpmath.ei(log(x)*((@.5-LLO@]*i)/j))).real for tl in L 
for j in [1..3]]) 

P += plot(F,n-5@,n,color='blue', lLegend_label='Really. 
good_estimate',plot_points=50) 

show (P) 


Q 
iT] 
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We can also just check out some numerical values. 


456 


import mpmath 

var('ty!) 

L = lcalc.zeros_in_interval(10,300,0.1) 

F = Lambda x: sum([(mpmath.li(x*(1/j))-log(2) + 
numerical_integral (1/(y*(y*2-1)*log(y)),x*(1/j) ,00) [0] 
)*moebius(j)/j for j in [1..3]]) - 
sum([(mpmath.ei(log(x)*((0.5+LL0]*i)/j)) + 
mpmath.ei(log(x)*((@.5-LL@]*i)/j))).real for lt in L for 
|] to (Pe o dally) 

var('y') 

L = lcalc.zeros_in_interval(10,300,0.1) 

F = Ltambda x: sum([(mpmath.li(x*(1/j))-log(2) + 
numerical_integral (1/(y*(y*2-1)xlog(y)),x*(1/j) ,00) [@] 
)*moebius(j)/j for j in [1..3]]) - 
sum(C(mpmath.ei(log(x)*((@.5+LL@]*i)/j)) + 
mpmath.ei(log(x)*((@.5-LL@]*i)/j))).real for lt in L for 
7 2 Lie. Sid) 

@interact 

def _(n=300): 
print (F(n)) 
print (prime_pi(n)) 
print (Li(n.n())) 
print(Li(n.n(Q)) - 1/2*Li(Csqrt(n.n(Q))) - 

1/3*Li((n.n())*(1/3))) 


Many wonderful facts would follow from the truth of the Riemann Hypoth- 


esis, or from a natural generalization. 


Fact 25.7.3 Consequences of the (generalized) Riemann Hypothesis. 
The following follow from the Riemann Hypothesis or a generalization for things 


like general Dirichlet series. 


e The Dirichlet series of the Mobius function would be the multiplicative 
inverse of the zeta function for lots more complex values than just the 


real ones we proved it for in . 


e The value (not just average) of a(n) would have the following bound once 


n ts big enough: 
a(n) < e” log(log(n)) 


e The biggest gap between consecutive prime numbers could not be too big 


(to be precise, O(,/plog(p)). 


e We would know exactly what it means for a congruence class of prime to 


win the ‘prime races’ (see Section 22.1). 


e Artin’s conjecture (Conjecture 17.5.8) on primitive roots follows from a 


generalization as well. 


So can you prove that there are no other zeros other than those on the 
critical line to contribute to these approximations to 7(x)? If so, welcome to 


the future of number theory! 
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25.8 Epilogue 


The Riemann zeta function and counting primes is truly only the beginning of 
research in modern number theory. Let’s see just a little more of its future. 

For instance, research in finding and counting points on curves (as in 
Chapter 15) leads to more complicated series like ¢, called L-functions. There 
is a version of the Riemann Hypothesis for them, too (see Fact 25.7.3 for 
some connections). Even without that, they gives truly interesting, strange, 
and beautiful results, particularly when counting points on the elliptic curves 
we mentioned at various points in text; a notable success of this was in the 
proof of Fermat’s Last Theorem. You may wish to continue with books like 
[E.4.19] or [E.4.5, Section 12.4] or [E.4.24, Chapters 13-15], or perhaps start 
doing Exercise 25.9.11 with an internet search. 

Here is a recent result of interest. Recall from Example 14.2.3 that the 
notation 712(n) should denote the number of ways to write n as a sum of twelve 
squares. Here, order and sign both matter, so (1,2) and (2,1) and (—2,1) are 
all different. 


Theorem 25.8.1 As we let p run through the set of all prime numbers, the 
distribution of the fraction 


r12(p) — 8(p? +1) 
32p>/2 


is precisely as this circular function in the long run: 


2 /i-# 


T 
Proof. Needless to say, this result is far beyond the level of this text — but 
maybe you will make the next contribution? Initially this result is a corollary 
of the proof of the Sato-Tate conjecture’ by Barnet-Lamb, Geraghty, Harris, 
and Taylor; that proof crucially used the so-called “Fundamental Lemma®” of 
Gérard Laumon and Ngé Bao Chau, the latter of whom won the Fields Medal 
based on proving it in very full generality. a 


Sage note 25.8.2 Into the future. The following graphic is based on one 
due to William Stein, the original founder and developer of Sage, in personal 
communication. 


8en.wikipedia. org/wiki/SatoOlate_conjecture 
%en. wikipedia. org/wiki/Fundamental_Lemma_(Langlands_program) 
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Figure 25.8.3 Distribution of modified twelve squares fraction (Sato-Tate) 


Try it interactively below. The higher the number, the closer the values 
should group to the distribution; change the number of bins in the histogram 
to see it more clearly. 


def sqrt2(): 
PI = float(pi) 
return plot(lambda x: (2/PI)*math.sqrt(1-x%*2), -1,1, 
plot_points=200, 
rgbcolor=(@.3,0.1,0.1), thickness=2) 


delta = delta_qexp(10*5) 


@interact 
def delta_dist(bins=(20,[10..150]), number = 
[500,1000,..,delta.prec()]): 


D = delta[l: number ] 

w = [float(D[p])/(2*float(p)*(5.5)) for p in 
prime_range(number + 1)] 

show(histogram(w, bins=bins, density=True) + sqrt2(), 
frame=True, gridlines=True) 


What an amazing result. These ideas are at the forefront of all types of 
number theory research today, and my hope is that you will enjoy exploring 
more of it, both with paper and pencil and using tools like Sage! 


25.9 Exercises 


1. Prove that e’* = cos(x)+isin(«) using Taylor series. Try to include proofs 
of the convergence of everything involved. 


2. Many books have a chain of reasoning interpreting the value ¢(—1) = 
oe Find a physical one and summarize the argument. (The Specialized 
References and Other References may have some suggestions.) Do you 
buy that adding all positive integers could possibly have a meaning? 
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3. Show all details for the improper integrals in Section 25.5. You may wish 
to have a refresher!® from any calculus textbook. 

4. Differentiate the function h(x) = 2*. Why is this question appropriate 
for this chapter? 


5. Verify numerically that )>>-, n(n) — 0; first try a calculator, then a 
computer. How close can you get to zero before your computer gives up? 

6. Justify the comment in Remark 25.4.4. That is, if f,g have domain of the 
positive reals and both (°°, f (2/") /n and 77, g (x1/") /n converge 
absolutely, show that when g(x) = 7°, f (2!/") /n we have f(x) = 
we, w(n)g (c/") /n. Hint: Substitute g into 7?_, u(m)g (2/™) /m, 
yielding a double sum in n,m with f (al/ alia now carefully switch the 
sum to be over k = mn and d | k, ending with a sum where nearly 
everything cancels out due to Proposition 23.1.5. 

7. See Exercise Group 24.7.9-10 for the definition of n(s), the Dirichlet eta 
function. Investigate whether there is a statement about this function 
which is logically equivalent to the Riemann hypothesis. 

8. Read one of the several excellent introductions to the Riemann Hypothesis 
intended for the “general reader”. (Some are listed in the Specialized 
References. ) 


Exercise Group. A natural next direction to explore is the notion of elliptic 
curves. These exercises will help you think about what you find interesting 
about them! 


9. How are elliptic curves used in cryptography? (Peruse Chapters 11-12 
for references. ) 


10. Find out more about Mordell’s Theorem and its connection to this 
chapter and/or to Fermat’s Last Theorem. 


11. What is the Birch-Swinnerton-Dyer Conjecture? Find out as much 
about it as you can. (See the Specialized References, for instance.) 


12. Answer one of these questions, or all of them. 
e What is a partition of a number? 
e What are continued fractions? 


e What is a number field? 
13. What else do you want to know about numbers? What are you inspired 
to discover? 


Summary: Further Up and Further In 


The final chapter in the book gives just a sense of possibly the most impor- 
tant open question in mathematics. 


1. In Section 25.1 and Section 25.2, we begin the process of asking how to 
improve our estimates of primes. 


2. The next section gives us enough background (and pictures!) to under- 
stand at least the gist of the Riemann Hypothesis, one of the Millennium 
Prize Problems. 


3. Sections 25.4, 25.5, and 25.6 all lead up to seeing the Riemann explicit 
formula in Section 25.7. 


lWactivecalculus. org/single/sec-6-5-improper.html 
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4. The Epilogue reminds us that this book is just the beginning. 


The Exercises lead you even further into the future of your number theory 
exploration! 


Appendix A 


List of Sage notes 


There are many great Sage references. But for the convenience of users of this 
text, we collect all the many Sage notes from the text here in one place. 


Sage note 1.5.1 
Sage note 1.5.2 
Sage note 2.1.2 
Sage note 2.1.3 
Sage note 2.4.5 
Sage note 4.2.1 
Sage note 4.2.2 
Sage note 4.2.4 
Sage note 4.2.5 
Sage note 4.2.6 
Sage note 4.5.2 
Sage note 4.6.2 
Sage note 5.3.8 
Sage note 5.4.3 
Sage note 6.1.3 
Sage note 8.2.2 
Sage note 9.1.6 
Sage note 9.3.2 
Sage note 9.3.3 
Sage note 10.0.2 
Sage note 10.1.3 
Sage note 10.2.2 
Sage note 10.5.5 
Sage note 11.1.1 
Sage note 11.1.2 
Sage note 11.2.1 
Sage note 11.3.1 
Sage note 11.3.4 
Sage note 11.3.6 
Sage note 11.5.1 
Sage note 11.6.1 
Sage note 12.4.8 


About Sage notes 
Using commands in Sage cells 
Counting begins at zero 
Repeating commands for different input 
Remind how to get list elements 
Timing your work 
Numbers too big for a computer 
Give things names 
Making tuples 
Types matter 
Checking equality 
List comprehensions 
Getting interactive Sage help 
Printing it out 
Making comments 
Colorful options 
Reminder to try things out 
Euler phi in Sage 
More complex list comprehension 
Reminder for colormaps 
Filtering list comprehensions 
How Sage does primitive roots 
Reminder on equality 
Definitions 
Always evaluate your definitions 
Reminder to evaluate definitions 
Another reminder to evaluate definitions 
Compute what you need 
Change values right in the code 
We keep reminding you 
A final reminder to evaluate definitions 
Reminder about timing 
(Continued on next page) 
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Sage note 12.5.2 
Sage note 12.5.5 
Sage note 12.6.8 
Sage note 13.1.3 
Sage note 13.4.15 
Sage note 16.2.2 
Sage note 16.3.3 
Sage note 17.1.6 
Sage note 17.4.11 
Sage note 18.2.4 
Sage note 18.2.6 
Sage note 19.2.2 
Sage note 20.2.2 
Sage note 21.1.2 
Sage note 21.1.3 
Sage note 21.1.4 
Sage note 21.4.4 
Sage note 22.3.11 
Sage note 23.1.6 
Sage note 25.8.2 
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Trying your primes yourself 
Code for trial division 

Building interacts 

Handling errors 

Examining code is good for you 
Commands of more sophistication 
Quadratic residues 

Check your work 

Names of functions may vary 
Review quiz 

Explore here 

Syntax for sigma 

Try to be efficient 

Syntax for counting primes 
Cython 

Not all algorithms are equal 
Python can do math too 

Sage can change 

Check your work again 

Into the future 


Appendix B 


List of Historical Remarks 


For convenience, below we collect some of the short historical remarks in the 
text. We hope these, and the many places where the history was part and 
parcel of the main text, whetted the reader’s appetite for more investigations! 

There is a huge number of places to learn about the history of number 
theory — including many of the books and articles in the References and Further 
Resources, especially the Historical References. Another excellent compendium 
of resources about mathematics history more generally is the MacTutor!? site. 


Historical remark 2.3.2 Euclid’s Elements 

Historical remark 2.4.7 Bezout and friends 
Historical remark 3.1.1 Diophantine and his equations 
Historical remark 3.5.2 Bachet de Méziriac 
Historical remark 3.5.3 Bachet equation 

Historical remark 3.5.6 Catalan’s conjecture — solved 
Historical remark 5.3.3 Ancient Chinese work on remainders 
Historical remark 6.2.5 Eratosthenes 

Historical remark 7.2.4 Hensel’s Lemma 

Historical remark 11.3.5 Diffie and Hellman 
Historical remark 11.4.1 Diffie-Hellman controversy 
Historical remark 11.5.2 Who is RSA? 

Historical remark 11.6.2 Sophie Germain 

Historical remark 12.1.5 Marin Mersenne 

Historical remark 12.1.7 GIMPS 

Historical remark 12.1.8 The Skylake bug 

Historical remark 12.6.7 — Factoring Fermat 

Historical remark 13.0.2 Albert Girard 

Historical remark 13.0.3 Leonhard Euler 

Historical remark 13.0.4 Pierre de Fermat 

Historical remark 13.1.8 Fibonacci 

Historical remark 13.4.16 Hermann Minkowski 
Historical remark 14.1.3 Carl Friedrich Gauss 
Historical remark 15.3.2 Louis Mordell 

Historical remark 15.5.6 Brahmagupta 


Historical 


remark 15.5.7 


Stigler’s Law 
(Continued on next page) 
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Historical 
Historical 
Historical 
Historical 
Historical 
Historical 
Historical 
Historical 
Historical 
Historical 
Historical 
Historical 


remark 16.3.7 
remark 16.4.1 
remark 17.2.1 
remark 19.4.13 
remark 21.2.7 
remark 21.3.2 
remark 21.3.3 
remark 22.2.2 
remark 22.3.9 
remark 22.3.10 
remark 23.1.2 
remark 24.2.3 
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Joseph-Louis Lagrange 
Adrien-Marie Legendre 
Gotthold Eisenstein 
Thabit ibn Qurra 
Skewes’ Number 

The Prime Number Theorem 
Pafnuty Chebyshev 
Lejeune Dirichlet 

The Pentium bug 
Twin prime status 
August Mobius 
Bernhard Riemann 
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Notation 


This is a quick guide to possibly unfamiliar notation. Page numbers or ref- 
erences usually refer to the first appearance of a notation with that meaning, 
occasionally to a definition. 


Symbol Description Page 
Z (ring of) integers 1 
N counting numbers (starting at zero) 1 
a|b a is a divisor of b 4 
gcd(a, b) greatest common divisor of a and b 12 
|x| greatest integer (floor) function 29 
a=b(modn) ais congruent to b modulo n 44 
[a] the equivalence class of a modulo some fixed n AT 
at multiplicative inverse of a number modulo some 62 
fixed n 
TU 2: product of unspecified, possible identical, 77 
primes 
IIp short form for product of primes fei 
Il¢ alternate short form for product of primes Te 
law product of unspecified distinct prime power it 
Il v* short form for product of prime powers at 
p* || n for p prime, p* | n but p**! does not divide n 81 
n! n factorial 81 
Icm(a, b) least common multiple of a and b 84 
Zn (ring of) integers modulo n 107 
A\ {a} the set of all elements in A except a € A 114 
IG| order of a group G 116 
|x| order of a group element x € G 116 
Us, group of units modulo n 122 
o(n) order of the group of units of n (Euler function) 124 
p(n) alternate notation for Euler ¢ function 124 
B, Fermat number 2?” +1 185 
Mn Mersenne number 2” — 1 187 
r2(n) number of different ways to write n as a sum of 234 
two squares 
Z(t] Gaussian integers {a + bi | a,b € Z} 237 
Cc complex numbers ar 
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Description 

number of different ways to write n as a sum of 
k perfect squares 

abbreviation for ‘quadratic residue’ 

group of quadratic residues of p 

Legendre symbol, for p an odd prime 
multiples of positive even numbers less than p 
by a 

set of nonnegative remainders of elements of aE 
modulo p 

remainder modulo p of the element ae of aE 
Jacobi symbol, n odd 


sum oven eucep || in proof of quadratic 
reciprocity 

sum ya . a in proof of quadratic reci- 
procity 

alternate notation for r2(n) 

sum of kth powers of divisors of n 

number of (positive) divisors of n 

sum of (positive) divisors of n 

unit function 

identity function 

abundancy index of n 

‘Big Oh’ notation that a function is less in ab- 
solute value than C'g(a), for some constant C 
natural (base e) logarithm 

Euler-Mascheroni gamma constant, limit of dif- 
ference between the harmonic series and nat- 
ural logarithm 

Gamma function factorial extension 

prime counting function 

number of integers coprime to first a@ primes 
the ath prime 

logarithmic integral [7 eat) 

Chebyshev theta function 

prime number indicator function 

primorial (product of primes up to p) 

twin prime constant 

Moebius function of n 

Dirichlet product of f and g as arithmetic func- 
tions 
Dirichlet product identity function 

number of unique prime divisors of n 
alternate notation for w(n) 

Liouville’s function 

Riemann zeta function 

Dirichlet eta function 

auxiliary function in Riemann explicit formula 


Page 
242 


278 
282 


287 
295 


295 


295 
303 


310 


311 


322 
327 
327 
327 
330 
330 
335 
347 


303 
309 


309 
367 
370 
370 
371 
380 
382 
395 
398 
404 
407 


408 
410 
410 
410 
419 
436 
448 
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List of Figures 


Chapter 1 Prologue 


Figure 1.4.1 FoxTrot comic 


Chapter 3 From Linear Equations to Geometry 


Figure 3.2.1 Solutions to a linear Diophantine equation 
Figure 3.3.2 Positive solutions to a linear Diophantine equation 
Figure 3.5.1 Visualizing when a cube is one less than a square 


Chapter 6 Prime Time 
Figure 6.2.2 Part of Euclid IX.20 proof 


Chapter 7 First Steps With General Congruences 
Figure 7.6.2 Solutions of a typical Mordell curve 

Figure 7.8.1 Cutting a cake with 7 candles using two cuts 
Figure 7.8.2 Cutting a cake with 6 candles using two cuts 
Figure 7.8.4 Stellated 7-gons 


Chapter 8 The Group of Integers Modulo n 
Figure 8.1.2 Addition table for Zs; 

Figure 8.1.3 Multiplication table for Z3 

Figure 8.1.4 Visualizing multiplication modulo n = 7 
Figure 8.2.1 Visualizing powers modulo n = 11 


Chapter 10 Primitive Roots 
Figure 10.0.1 Visualizing powers modulo n = 11 (again) 
Figure 10.1.2 Visualizing powers modulo n = 10 


Chapter 12 Some Theory Behind Cryptography 
Figure 12.2.1 Visualizing powers modulo n = 11 (yet again) 
Figure 12.3.1 Visualizing powers modulo n = 11 (last time) 


Chapter 13 Sums of Squares 
Figure 13.1.5. Five as a sum of squares 
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Appendix E 


References and Further Re- 
sources 


E.1 Introduction to the References 


There are so many resources I used in preparation of this book it would be very 
hard to list all of them. Still, I have a lot of recommendations for further read- 
ing, places for instructors to look for alternate examples, proofs, exercises, etc., 
and most of these are books I have actively used at some point. I attempted 
to include a canonical website for each book, though be aware that especially 
publisher pages may change at short notice. I’ve also included some valuable 
articles I have benefited from. 


E.2 General References 
There are many good introductory number theory texts. 


[1] Gareth A. and J. Mary Jones, Elementary Number Theory, Springer, 
London, (2005). (Website?) 
A good introduction with an emphasis on groups, containing interleaved 
exercises with full answers. 

[2] G.H. Hardy and E. M. Wright, An Introduction to the Theory of Numbers, 
fifth edition, Oxford, (1979) (Website? for expanded sixth edition) 
A highly regarded text with copious notes, but sometimes more than a 
little hard to parse with its consecutively numbered theorems and very 
dense prose. 


[3] William Stein, Elementary Number Theory: Primes, Congruences, and 
Secrets, Springer, (2008) (Website®) 
Freely available and the first Sage-enabled number theory text, by the 
founder of Sage (a number theorist). 


[4] Ken Rosen, Elementary Number Theory and its Applications, Pearson, 
(2011). (Website*) 
A venerable text with programming exercises that still wear well. 


lwww. springer. com/us/book/9783540761976 
2global.oup.com/academic/product/an- introduction-to-the-theory-of-numbers- 9780199219865 
3wstein.org/ent/ 
4www.pearsonhighered.com/educator/product/ELementary-Number-Theory/ 
9780321500311. page 
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[5] David C. Marshall, Edward Odell, Michael Starbird, Number Theory 
through Inquiry, Mathematical Association of America, Washington, (2007). 
(Website’) 

The topics are very standard, but the approach is quite different; no 
proofs, only statements. This turns out to be a highly effective pedagogy; 
see the Academy of Inquiry Based Learning® for more information. 


[6] R.P. Burn, A pathway into number theory, Cambridge, (1996) (Website” ) 
A very fun inquiry-driven text before there were such things, with a lot 
of extremely good examples, especially in things like quadratic forms. 


[7] John Stillwell, Elements of Number Theory, Springer, (2003) (Website®) 
More algebraically oriented, with good material on the Pell equation and 
Gaussian integers — noteworthy for a good treatment of Conway’s river 
concepts. 


[8] Harold Shapiro, Introduction to the Theory of Numbers, Dover, (2008) 
(No website) 
Incredibly comprehensive, at a fairly high level. Good material on av- 
erages and odd perfection, immense bibliography and notes in style of 
[E.2.2], and also inquiry-driven “do-it-yourself” sections. Appears to be 
out of print. 


[9] Anthony Gioia, The Theory of Numbers, Dover, (2001) (No website) 
Surprisingly detailed and high-level but has good coverage of several 
unusual topics such as geometry of numbers. 


[10] Marty Erickson, Anthony Vazzana, David Garth, Introduction to Number 
Theory, second edition, CRC, (2016). (Website?) 
Enough material for two courses, some fairly advanced, and newly en- 
dowed with downloadable Sage worksheets for use with local or online 
CoCale”. 


[11] George Andrews, Number Theory, Dover, (1994) (Websitet') 
Yet another nice reprint from Dover, this one with (as one would expect 
of the author) great combinatorial content. 


[12] H.M. Edwards, Higher Arithmetic: An Algorithmic Introduction to Num- 
ber Theory, American Mathematical Society, (2006) (Website?) 
Not so algorithmic, but very, very concrete and constructive. Squares 
are Us, which grows on the reader. 


[13] Neville Robbins, Beginning Number Theory, Jones and Bartlett, (2006) 
(No website) 
An out-of-print standard text with many similar topics and interesting 
historical comments. 


[14] Oystein Ore, Invitation to Number Theory, Mathematical Association of 
America, (1967) (Website?) 
An older text that is still worth the conversational tone. 


[15] Duff Campbell, An Open Door to Number Theory, American Mathemat- 


5www.maa.org/pubLications/ebooks/number-theory-through- inquiry 

Swww. inquirybasedlearning. org 

“www. cambridge. org/us/academic/subjects/mathematics/number-theory/ 
pathway-number-theory-2nd-edition 

8www. springer. com/us/book/9780387955872 

9%tvazzana.sites.truman.edu/introduction-to-number-theory/ 

1 cocalc.com 

11store.doverpublications.com/0486682528.htmlL 

12bookstore.ams.org/stml-45/ 

13www.maa.org/press/ebooks/invitation-to-number-theory 
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[16] 


[17] 


ical Society/MAA Press, (2018), (Website?) 

Careful emphasis throughout on getting a novice student ready for ab- 
stract algebra/algebraic number theory, with Q[Vd] coherent in an ele- 
mentary text. Don’t miss continued fractions in the service of the Bezout 
identity and the many interesting projects, including one on the p-adic 
numbers. 


Robert Freud and Edit Gyarmati, Number Theory, American Mathemat- 
ical Society, (2020), (Website?) 

See my review for MAA reviews!® of this relatively ambitious text. Could 
be very interesting to use for a two-semester algebra sequence that starts 
with number theory. 


Cam McLeman, Erin McNicholas, and Colin Starr, Explorations in Num- 
ber Theory: Commuting through the Numberverse, Springer, (2022), (Web- 
sitet”) 

An inquiry-friendly introduction with a uniquely elementary tilt toward 
algebraic number theory. And many, many puns. 


E.3 Proof and Programming References 


The first few books here are good resources for an introduction to proof, which 
should cover anything needed as a prerequisite for this text. 

In addition to the many good programming exercises in several books in 
the General References, the latter books will give you an introduction to the 
programming side of things. 


[1] 


[2] 


[3] 


[4] 


[5] 


Richard Hammack, Book of Proof, (2018). (Website*®) 
A quality middle-of-the-road introduction to proof, used reasonably widely 
and covering all standard topics for a proof transition course. 


Joseph Fields, A Gentle Introduction to the Art of Mathematics, (2013). 
(Website?®) 

The title is pretty accurate; this is a quite gentle open text usable for 
self-study. 


Edward Burger, Extending the Frontiers of Mathematics, Key College, 
(2007) (Website?°) 

This book is not necessarily just an introduction to proof, but has a 
wonderful attitude to conjecture. Essentially, one should view every proof 
as an opportunity to extend, and every disproof as a chance to rescue. 
Gregory Bard, Sage for Undergraduates, American Mathematical Society, 
(2015) (Website?) 

This is a very good guide to Sage for anyone starting out with basic 
college math knowledge; the author has taught using Sage for some time. 
Did I mention it is freely downloadable as well as available in print? 


Craig Finch, Sage: Beginner’s Guide, Packt, (2011) (Website?) 


l4bookstore.ams.org/text-39 

15bookstore.ams.org/amstext-48/ 
16www.maa.org/press/maa-reviews/number-theory-2 

17Link. springer .com/book/10.1007/978-3-030-98931-6 

18 www. people. vcu. edu/~rhammack/BookOfProof/ 
19giam.southernct.edu/GIAM/ 
20www.wiley.com/WileyCDA/WileyTitLe/productCd-EHEP000280. html 
21www. gregorybard.com/Sage.html 

?2www.packtpub. com/hardware-and-creative/sage-beginners-guide 
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This guide is not free, but is comprehensive (for the time it was writ- 
ten) and has the unique perspective of someone not involved in the Sage 
community. 


[6] Paul Zimmermann et al., Computational Mathematics with SageMath, 
SIAM/the authors, (2018) (Website?3) 
This is an updated English edition of a very comprehensive book orig- 
inally written in French. Includes everything from numerics to graph 
theory. Available for free as well.?4 


[7] Allen Downey, Think Python, O’Reilly, (2012) (Website?°) 
A very good introduction to programming from scratch in Python, usable 
from the website or as a hard-copy text. 


[8] Zed Shaw, Learn Python the Hard Way, Addison-Wesley, (2013) (Web- 
site?®) 
A preternaturally idiosyncratic take on how to program, but well worth 
the effort to learn things the hard way if you have the time to push 
through it. 


E.4 Specialized References 


Number Theory is a huge field, and even at an introductory level there are 
many wonderful resources to be aware of. I have used many of the following 
in one way or another in preparation of this text, and if you are intrigued by a 
specific facet of number theory, I encourage you to get these from your library! 
Most of these are more specialized, but a few are not really texts but intended 
for the “casual” reader. 


[1] John Derbyshire, Prime Obsession, Joseph Henry Press, (2003) (Web- 
site?”) 
A marvelous achievement of bringing the Riemann Hypothesis to the 
(determined) lay reader while simultaneously making you care about post- 
Napoleonic Europe. If I do say so myself?®. 


[2] Roland van der Veen and Jan van de Craats, The Riemann Hypothesis, 
Mathematical Association of America, (2016). (Website?®) 
Interesting lecture notes leading to a basic understanding of the Riemann 
Hypothesis, based on a high-school enrichment program in the Nether- 
lands. 


[3] Barry Mazur and William Stein, Prime Numbers and the Riemann Hy- 
pothesis, Cambridge University Press, (2016). (Website?°) 
This book goes straight for the jugular of the Riemann Hypothesis, start- 
ing from scratch. That requires a lot of investment, but you won’t find it 
from the perspective of working number theorists in other books, either. 


[4] H.M. Edwards, Riemann’s Zeta Function, Dover, (2001) (Website?) 
Still useful comprehensive first text on this important topic. 


23bookstore.siam.org/ot160 

24dL. Lateralis.org/public/sagebook/sagebook-ba6596d. pdf 

25 sreenteapress.com/wp/think-python/ 

26 Learncodethehardway.org/python/ 
27www.nap.edu/catalog/10532/prime-obsession-bernhard-riemann-and-the-greatest-unsolved-problem- in 
28www.booksandculture.com/articles/2009/janfeb/prime. html 
2°www.maa.org/press/books/the-riemann-hypothesis 

30wstein.org/rh/ 

31store.doverpublications.com/0486417409.htmL 
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[5] 


[6] 


[7] 


[8] 


[9] 


[10] 


[11] 


[12] 


[13] 


Jeffrey Stopple, A Primer of Analytic Number Theory, Cambridge, (2003). 
(Website?) 

Very innovative book on exactly what it says; second half not neces- 
sarily for every US undergraduate, but easiest introduction to Birch- 
Swinnerton-Dyer I could find! Covers most traditional material, too, 
and has copious entertaining historical notes. 


Tom Apostol, Introduction to Analytic Number Theory, Springer, (1976). 
(Website?) 

The canonical “undergraduate” analytic number theory book. Monumen- 
tal but very difficult; zillions of interesting results in exercises. 


Stan Wagon and David Bressoud, A Course in Computational Number 
Theory, Wiley, (2008). (Website**) 

Contains Mathematica code to visualize and explore a lot of interesting 
number theory, and is very consistent with the computational viewpoint 
throughout. 


Paul Pollack, Not Always Buried Deep, American Mathematical Society, 
(2009). (Website®°) 

Definitely a second course in number theory, as the subtitle says, with 
good material on arithmetic progressions and the Hilbert-Waring prob- 
lem (the latter is difficult to find in a textbook). 


Saban Alaca and Kenneth S. Williams, Introductory algebraic number 
theory, Cambridge University Press, (2003). (Website?®) 
As the title says, and one appropriate for an undergraduate library. 


Harold Davenport, The Higher Arithmetic, Cambridge University Press, 
(2008). (Website?”) 

Another well-known general resource, with a very good description of how 
to find if a rational conic has a rational point (which directly connects 
to integer points on conics as well). 


Stephen Richards, A Number for Your Thoughts, S. P. Richards, (1982) 
(No website) 

Many very interesting topics for the general reader, from repunits to all 
sorts of other topics. Intriguing story must lie behind the essentially 
identical book by a different author several years later. 


Samuel S. Wagstaff, Jr., The Joy of Factoring, American Mathematical 
Society, (2013). (Website®®) 

The title says it all, and more accessible to college students than one 
would think. By one of the leaders in the field. 


George Andrews and Kimmo Eriksson, Integer Partitions, Cambridge 
University Press, (2004). (Website®®) 

A brilliant, accessible, inventive book which makes me very sad there is 
only enough time for so many topics in a one-semester course. Indispens- 


32www. cambridge. org/us/academic/subjects/mathematics/number-theory/ 
primer-analytic-number-theory-pythagoras-riemann 

33 www. springer .com/us/book/9780387901633 

34www.wiley.com/WileyCDA/WileyTitle/productCd-0470412151. html 

35bookstore.ams.org/mbk-68 

36www. cambridge. org/core/books/introductory-algebraic-number-theory/ 
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37www. cambridge. org/us/academic/subjects/mathematics/number-theory/ 
higher-arithmetic-introduction-theory-numbers- 8th-edition 
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39www. cambridge. org/us/academic/subjects/mathematics/number-theory/ 
integer-partitions 


APPENDIX E. REFERENCES AND FURTHER RESOURCES 476 


[14] 


[15] 


[16] 


[17] 


[18] 


[19] 


[20] 


[21] 


able for bringing partitions to undergraduates. 


Richard Friedberg, An Adventurer’s Guide to Number Theory, Dover, 
(1995) (Website*°) 

Very conversational and enjoyable; not really a textbook. Key feature 
is a detailed discussion of how Euler missed what is essentially unique 
factorization in a certain number field for two of his more interesting 
results — and he does it without actually proving unique factorization! 


Julian Havil, Gamma: Exploring Euler’s Constant, Princeton, (2009). 
(Website*) 

This book turns out to be about both [ the function and 7 the constant 
(recall Definition 20.3.10), and includes a description of Apéry’s tomb 
(see Remark 24.4.1 with regard to ¢(3)). 


C. D. Olds, Anneli Lax, Giuliana Davidoff, The Geometry of Numbers, 
Mathematical Association of America, (2000) (Website*?) 
Delightful introduction to and inspiration for many of the lattice topics 
pursued in this text. The second half goes fairly deep, and is more than 
worth pursuing as a directed study with undergraduates. 


Paulo Ribenboim, The Little Book of Bigger Primes, Springer, (2004) 
(Website*?) 

This book has incredible amounts of interesting detail regarding many of 
the prime topics considered here. An example: a discourse on whether 
the pseudoprime criterion base 2 was really discovered by ancient Chinese 
mathematicians. 


Paulo Ribenboim, My Numbers, My Friends, Springer, (2000) (Web- 
site**) 

Based on a series of lectures, this book is rather higher level, but has cor- 
respondingly more truly interesting material, including an entire chapter 
inspired by 1093 and a very early prime-generating algorithm by a certain 
Pocklington. 


Thomas R. Shemanske, Modern Cryptography and Elliptic Curves: A 
Beginner’s Guide, American Mathematical Society, (2017) (Website?) 
This really is a beginner’s guide, which developmentally arrives at addi- 
tion on projective elliptic curves. The focus on cryptography is clear with 
Lenstra’s ECM algorithm as payoff, but BSD is also reasonably described. 
But why mention safe primes and not Germain primes? 


Martin H. Weissman, An Illustrated Theory of Numbers, American Math- 
ematical Society, (2017), (Website*®) 

Lushly illustrated, including for nonstandard topics like Conway’s topo- 
graph and Gaussian/Eisenstein. Emphasis on dynamical point of view, 
even for Euler’s Theorem. Well-researched historical notes, and linked 
Jupyter notebooks on the website. 


Benjamin Hutz, An Experimental Introduction to Number Theory, Amer- 
ican Mathematical Society, (2018), (Website*’) 
Many in-depth topics somewhat beyond a standard semester course, such 


40store.doverpublications.com/0486281337.html 
4lpress.princeton.edu/titles/7494. html 
42www.maa.org/press/ebooks/the-geometry-of-numbers 
43 www. springer. com/us/book/9780387201696 

44www. springer. com/us/book/9780387989112 

45 bookstore. ams.org/stml- 83 
46ilLustratedtheoryofnumbers.com 
47bookstore.ams.org/amstext- 31 
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[22] 


[23] 


[24] 


[25] 


[26] 


[27] 


as height and Diophantine approximation. Unique is covering dynamical 
systems on polynomials over Q. The intriguing exploratory exercises lack 
pseudocode. 


Alasdair McAndrew, Introduction to Cryptography with Open-Source Soft- 
ware, CRC, (2011), (Website*®) 

I have not read this, but with full sections on DES and AES, elliptic 
curves, and “El Gamal in Sage”, I think it could be a good complement 
on the application side to many of the texts in these references. 


Avner Ash and Robert Gross, Fearless Symmetry, Princeton, (2008), 
(Website*?) 

Astonishingly, builds up in a conversational tone from practically nothing 
to Galois representations coming from elliptic curves and the connection 
to Fermat’s Last Theorem. Explicitly connects quadratic reciprocity to 
quadratic curves, for instance. Highly recommended. 


Avner Ash and Robert Gross, Elliptic Tales, Princeton, (2012), (Web- 
site?) 

A followup to [E.4.23], which attempts to explain elliptic curves from the 
ground up through to their D-functions and the Birch-Swinnerton-Dyer 
conjecture. 


Lasse Rempe-Gillen and Rebecca Waldecker, Primality Testing for Be- 
ginners, American Mathematical Society, (2014), (Website°') 

Although it does cover a lot of basic number theory, the unusual main 
focus is making the proof of Agrawal, Kayal, and Saxena that deciding 
whether a number is prime is in the computational complexity class P 
directly accessible to (talented) high school and university students. 


Paul Pollack, A Conversational Introduction to Algebraic Number Theory, 
American Mathematical Society, (2017), (Website®”) 

Definitely requires a good ring and field background, but also truly con- 
versational. It starts with a very thorough treatment of quadratic number 
fields, then starts over, meanwhile making reference to a startling num- 
ber of both original papers from the nineteenth century and very recent 
Monthly articles. 


Roger Plymen, The Great Prime Number Race, American Mathematical 
Society, (2020), (Website®?) 

A great deal of information about the zeta function, especially the func- 
tional equation, with an eye toward both the explicit formulas and specif- 
ically Littlewood and Skewes’ results. 


E.5 Historical References 


Number Theory is also a very old field, as should be clear from using this 
book. Here I have collated references intended both for mathematicians and 
the fabled ‘educated laity’. (Note that many of the other books referenced here 
have significant historical content, notably [E.4.5].) 
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[1] Jim Tattersall, Elementary Number Theory in Nine Chapters, Cambridge 
University Press, (2005) (Website>*) 
Oodles of class-tested historical material and many, many exercises, in- 
cluding a welter of them on topics surrounding amicable numbers. 


[2] John J. Watkins, Number Theory: A Historical Approach, Princeton, 
(2013). (Website®®) 
A very nice historically-oriented approach to elementary number theory. 
Includes Sage material in an appendix. 


[3] Oystein Ore, Number Theory and Its History, Dover, (1948). (Website°®) 
Another conversational classic by Ore, with plenty of historical goodies. 


[4] Jay Goldman, The Queen of Mathematics, AK Peters, (1997) (Website®”) 
A truly historical sojourn through much of number theory up through 
the early twentieth century, with extensive primary source material and 
investigation of Gauss’ monumental work. Sadly, largely beyond the level 
of this text. 


[5] William Dunham, Journey Through Genius, Wiley, (1990). (Website®®) 
This is intended for those without calculus, but has many great number- 
theoretic bits all the same. 


[6] William Dunham, Euler: The Master of Us All, Mathematical Associa- 
tion of America, (1999). (Website?) 
This book has some nice discussion of Euler’s number theory alongside 
many other historical vignettes with real math power. 


[7] A. Knoebel et al., Mathematical Masterpieces: Further Chronicles by the 
Explorers, Springer, (2007). (Website®°) 
Collection of additional classroom resources focused on primary source 
material, including the Basel problem and quadratic reciprocity. 


[8] André Weil, Number Theory: An approach through history From Ham- 
murapi to Legendre, Birkhauser, (1984). (Website®) 
Absolutely first-rate mathematician’s insider view into the contributions 
of Fermat and Euler. Plenty of opinions and connections to modern 
mathematics, though sadly it will never be updated to connect Wiles’ 
work on elliptic curves to Fermat’s legacy. 


[9] Waclaw Sierpinski, Pythagorean Triangles, Dover, (2013). (Website?) 
In general it’s accessible to a student using this book, though as a reprint 
of a fifty-year-old book it (as a recent College Mathematics Journal review 
put it) could use ‘certain updates’. 


[10] Ulrich Libbrecht, Chinese Mathematics in the Thirteenth Century, Dover, 
(1973). (No website) 
Reprint of MIT Press original publication (now out of print), an ex- 
tremely thorough discussion of Qin Jiushao’s entire mathematical opus 
within its cultural context. About half the book is a monograph on the 


54www. cambridge. org/us/academic/subjects/mathematics/number-theory/ 
eLementary-number-theory-nine-chapters-2nd-edition?format=PB 

55press.princeton.edu/titles/10165.htmL 

56 store. doverpublications.com/0486656209.htmL 

57www.crcpress.com/The-Queen-of-Mathematics-A-Historically-Motivated-Guide-to-Number-Theory/ 
Goldman/p/book/9781568810065 

58www.wiley.com/WileyCDA/WileyTitLe/productCd- 0471500305. html 

59www.maa.org/press/books/euLer-the-master-of-us-all 
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[11] 


[12] 


[13] 


Chinese Remainder Theorem, hence its inclusion in this set of references. 


Alireza Djafari Naini, Geschichte der Zahlentheorie im Orient, Verlag 
Klose und Co., (1982). (No website) 

Special focus on number theory in the medieval era in the Islamic world, 
especially Persian mathematicians. Many explicit examples, and compar- 
isons with Diophantus and more modern sources. 


Paul M. Nahin, In Pursuit of Zeta-3: The World’s Most Mysterious 
Unsolved Math Problem, Princeton, (2021). (Website®*) 

More information than you would ever want to know about Apéry’s con- 
stant, by an engaging author. Many computations. 


David Pengelley, Number Theory Through the Eyes of Sophie Germain: 
An Inquiry Course, Mathematical Association of America, (2023). (Web- 
site®) 

This book really fits under several categories. If you are interested in 
using an inquiry-based pedagogy, have fairly well-prepared students, and 
are interested in using primary sources to explore number theory with 
them, why not explore with one of the greats, Sophie Germain? Early 
reviews (such as this one®’) have been laudatory. 


E.6 Other References 


Some books are just interesting, even if they are not primarily about number 
theory. I enjoyed all of these a great deal and recommend them. 


[1] 


[2] 


[3] 


[4] 


[5] 


Richard Evans Schwartz, You Can Count on Monsters, A K Peters, 
(2010) (Website®) 

This delightful picture book has a different monster for each prime num- 
ber, with bizarre combinations for composites. Personal experience says 
it satisfies for ages three and up. 


Nathan Carter, Visual Group Theory, Mathematical Association of Amer- 
ica, (2009). (Website®’) 
Visualize group theory; gorgeous pictures. 


John H. Conway and Richard Guy, The Book of Numbers, Springer, 
(1996). (Website®) 
A joyous and pictorially engaging romp. 


Arthur T. Benjamin and Ezra Brown (eds.), Biscuits of Number Theory, 
Mathematical Association of America, (2009). (Website®?) 

A very good compendium of many articles (published throughout the 
years) most appropriate for teachers of undergraduate number theory. 
Kerins et al., Famous Functions in Number Theory, American Mathe- 
matical Society, (2015). (Website”?) 

Aimed at bringing number theory to in-practice or pre-practice educators, 
this has a very nice treatment of arithmetic functions. Once you’ve heard 
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of summation and Moebius inversion as ‘parent’ and ‘child’ relationships, 
you'll never think of them the same again. 


[6] Kerins et al., Applications of Algebra and Geometry to the Work of Teach- 
ing, American Mathematical Society, (2015). (Website’!) 
Aimed at bringing algebra and geometry to in-practice or pre-practice 
educators; manages to bring Gaussian and Eisenstein integers and some 
quadratic forms in at the ground level. 


[7] T.S. Michael, How to Guard an Art Gallery, Johns Hopkins, (2009) 
(Website’?) 
The subtitle is “and other discrete mathematical adventures”, and that 
about says it. Covers a surprising amount of number theory in very visual 
ways. 


[8] Robert Young, Excursions in Calculus: An Interplay of the Continuous 
and Discrete, Mathematical Association of America, (1992) (Website’?) 
Unfortunately no longer in print, but a very good source of ideas for 
connecting what we usually think of as the continuous world of calculus 
and various discrete topics (not just number theory, though this shows 
up in several chapters). 


[9] Dora Musielak, Prime Mystery: The Life and Mathematics of Sophie 
Germain, AuthorHouse, (2015) (Website”) 
The title says it all, and probably the most comprehensive resource on 
this intriguing mathematician out there. As is typical for a samizdat, it 
could use more editing and probably speculates a bit much, but given 
how little we know about Germain still impressive. 


[10] Alan Beardon, Mathematical Exploration, Cambridge, (2016) (Website”®) 
Part of the AIMS” Library Series, this book includes plenty of fun, di- 
rected, proto-research on topics like families of Pythagorean triples and 
the conductor. Explore! 


[11] Apostolos Doxiadis, Uncle Petros and Goldbach’s Conjecture, Blooms- 
bury, (2000) (Website?) 
This ‘novel of mathematical obsession’ is a Bildungsroman of sorts that 
does a surprisingly good job of also introducing the still-unproven con- 
jecture that any even number greater than four is the sum of two odd 
primes. 


[12] Riley Tipton Perry, Quantum Computing from the Ground Up, World 
Scientific, (2012) (Website’®) 
The best elementary introduction to quantum computing I’ve yet found. 
That doesn’t make it easy, but it could certainly be used with under- 
graduates with a modicum of linear algebra and familiarity with complex 
numbers (and circuitry!). Yes, Shor’s algorithm and QFT is outlined at 
this level, rather remarkably. 
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E.7 Useful Articles 


Throughout the text, I’ve attempted to reference articles in so-called ‘generalist’ 
mathematics publications which have been useful or intriguing. See also the 
collection [E.6.4], where some of these appear. 


[1] 
[2] 


[3] 


[4] 


[5] 


[6] 


[7] 
[8] 


[9] 


[10] 
[14] 
[12] 
[13] 
[14] 
[15] 
[16] 
[17] 


[18] 


Ivan Niven and Barry Powell, Primes in Certain Arithmetic Progressions, 
The American Mathematical Monthly, June-July 1976, 83 no. 6, 467-469. 


D. Zagier, A One-Sentence Proof That Every Prime p = 1( mod 4) Is 
a Sum of Two Squares, The American Mathematical Monthly, February 
1990, 97 no. 2, 144-144. 


Andrew Granville and Greg Martin, Prime Number Races, The American 
Mathematical Monthly, January 2006, 113 no. 1, 1-33. 


David A. Cox, Why Fisenstein Proved the Eisenstein Criterion and Why 
Schénemann Discovered It First, The American Mathematical Monthly, 
January 2011, 118 no. 1, 3-21. 


Steven H. Weintraub, On Legendre’s Work on the Law of Quadratic Reci- 
procity, The American Mathematical Monthly, March 2011, 118 no. 3, 
210-216. 


Jonathan Bayless and Dominic Klyve, Reciprocal Sums as a Knowledge 
Metric: Theory, Computation, and Perfect Numbers, The American 
Mathematical Monthly, November 2013, 120 no. 9, 822-831. 


Xianzu Lin, Infinitely Many Primes in the Arithmetic Progression kn—1, 
The American Mathematical Monthly, January 2015, 122 no. 1, 48-51. 


Reinhard Laubenbacher and David Pengelley, Eisenstein’s Misunder- 
stood Geometric Proof of the Quadratic Reciprocity Theorem, The College 
Mathematics Journal, January 1994, 25 no. 1, 29-34. 


Roger B. Nelsen, Proof Without Words: Square Triangular Numbers and 
Almost Isosceles Pythagorean Triples, College Mathematics Journal, May 
2016, 47 no. 3, 179-179. 


David Lowry-Duda, Unexpected Conjectures about -5 Modulo Primes, 
College Mathematics Journal, January 2015, 46 no.1, 56—57. 


William G. Stanton and Judy A. Holdener, Abundancy “Outlaws” of the 


a. Journal of Integer Sequences, 10 


D.R. Slavitt, Give Way To God, or The Dying Christ — Pierre de Fermat, 
The Mathematical Intelligencer, Summer 2012, 34 no. 2, 3-5. 


Paul Nahin, The Mysterious Mr. Graham, The Mathematical Intelli- 
gencer, Spring 2016, 38 no.1, 48-51. 


Form 


P. A. Weiner, The abundancy index, a measure of perfection, Mathemat- 
ics Magazine, October 2000, 73 no. 4, 307-310. 


Andrew Bremner, Positively prodigious powers or how Dudeney done it?, 
Mathematics Magazine, April 2011, 84 no. 2, 120-125. 

Rafael Jakimczuk, The Quadratic Character of 2, Mathematics Magazine, 
April 2011, 84 no. 2, 126-127. 

Russell A. Gordon, Properties of Eisenstein Triples, Mathematics Maga- 
zine, February 2012, 85 no.1, 12-25. 


Roger B. Nelsen, Proof Without Words: Infinitely Many Almost-Isosceles 
Pythagorean Triples Exist, Mathematics Magazine, April 2016, 89 no. 2, 
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[19] 


[20] 


[21] 
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[24] 
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[26] 
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[28] 
[29] 
[30] 


[31] 


[32] 
[33] 
[34] 


[35] 


[36] 
[37] 


[38] 


103-104. 


C. Edward Sandifer, How Euler Did It: Odd Perfect Numbers, MAA 
Online, November 2006 


Matthias Beck, How to change coins, M&M’s, or chicken nuggets: The 
linear Diophantine problem of Frobenius, in Resources for Teaching Dis- 
crete Mathematics: Classroom Projects, History Modules, and Articles 
(B. Hopkins, ed.), Mathematical Association of America, 2009, 65-74. 


S. A. Rankin, The Euclidean Algorithm and the Linear Diophantine Equa- 
tion ax+by = gcd(a, b), The American Mathematical Monthly, June-July 
2013, 120 no. 6, 562-564. 


F. Saidak, A new proof of Fuclid’s theorem, The American Mathematical 
Monthly, December 2006, 113 no. 10, 937-938. 


Yannick Saouter and Patrick Demichel, A sharp region where r(x) —li(x) 
is positive, Mathematics of Computation, October 2010, 79 no. 272, 2395— 
2405. 


Kent Boklan and John Conway, Expect at Most One Billionth of a New 
Fermat Prime!, The Mathematical Intelligencer, 2017, 39 no.1, 3-5. 


Bruce Berndt et al., The Circle Problem of Gauss and the Divisor Prob- 
lem of Dirichlet—Still Unsolved, The American Mathematical Monthly, 
February 2018, 125 no. 2, 99-114. 


William Dunham, The Early (and Peculiar) History of the Mébius Func- 
tion, Mathematics Magazine, April 2018, 91 no. 2, 83-91. 


Enrique Trevifio, An Inclusion-Exclusion Proof of Wilson’s Theorem, The 
College Mathematics Journal, November 2018, 49 no. 6, 367-377. 


John Cosgrave and Karl Dilcher, Extensions of the Gauss- Wilson Theo- 
rem, Integers, 2008, 8 no. 1, A39. 


Ernest Eckert, The Group of Primitive Pythagorean Triangles, Mathe- 
matics Magazine, February 1984, 57 no. 1, 22-27. 


John Brillhart, A Note on Euler’s Factoring Problem, The American 
Mathematical Monthly, December 2009, 116 no. 10, 928-931. 


Christian Aebi and Grant Cairns, Sums of Quadratic Residues and Non- 
residues, The American Mathematical Monthly, February 2017, 124 no. 2, 
166-169. 


A. Rotkiewicz and K. Ziemak, On Even Pseudoprimes, The Fibonacci 
Quarterly, May 1995, 33 no. 2, 123-125. 


Lars-Daniel Ohman, Are Induction and Well-Ordering Equivalent?, The 
Mathematical Intelligencer, September 2019, 41 no. 3, 33-40. 


Trevor Woolsey, A Superpowered Euclidean Prime Generator, The Amer- 
ican Mathematical Monthly, April 2017, 124 no. 4, 351-352. 


Edray Goins et al., Lattice Point Visibility on Generalized Lines of 
Sight, The American Mathematical Monthly, August-September 2018, 
125 no. 7, 593-601. 


Dylan Fridman et al., A Prime-Representing Constant, The American 
Mathematical Monthly, January 2019, 126 no. 1, 70-73. 


Roger Nelsen, Even Perfect Numbers End in 6 or 28, Mathematics Mag- 
azine, April 2018, 91 no. 2, 140-141. 


Howard Sporn, Pythagorean Triples, Complex Numbers, and Perplex 
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[40] 


[41] 
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[44] 
[45] 


[46] 


[47] 


Numbers, The College Mathematics Journal, March 2017, 48, no. 2, 115— 
122. 


Aalok Thakkar, Infinitude of Primes Using Formal Languages, The Amer- 
ican Mathematical Monthly, October 2018, 125, no. 8, 745-749. 


Hing-Lun Chan and Michael Norrish, A String of Pearls: Proofs of Fer- 
mat’s Little Theorem in “Hawblitzel C., Miller D. (eds.) Certified Pro- 
grams and Proofs, CPP 2012”, Lecture Notes in Computer Science, 7679, 
188-207. 


Solomon Golomb, Combinatorial Proof of Fermat’s “Little” Theorem, 
The American Mathematical Monthly, December 1956, 63, no. 10, 718. 


Steven R. Benson, Pythagorean Paper Folding, Mathematics Magazine, 
February 2021, 94, no. 1, 34-42. 


Zafer Selcuk Aygin and Kenneth S. Williams, Why does a Prime p Divide 
a Fermat Number?, Mathematics Magazine, October 2020, 93, no. 4, 288- 
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Index 


abundancy index, 335 
and odd perfect numbers, 340 
abundancy outlaws, 336 
abundant number, see number, 
abundant 
aliquot parts, see divisor, proper 
amicable numbers, 336 
algorithm, 337 
Apéry’s constant, 424 
via Twitter, 430 
Aryabhata, 15 
associates, 239 
asympotic, 371 
average, see long-term average 


Bachet equation, 38, 255 
as special case of Mordell, 38 
Euler’s ‘proof’ of, 257 
Bachet, sieur de Méziriac, 37 
base a test, 190, 191 
Miller’s, see Miller’s test base 
a 
Bertrand’s postulate, 376 
Bezout identity, see Euclidean 
algorithm, extended 
Big Oh notation, see Landau 
notation 
Brahmagupta, 69, 216, 265 
quote about mathematicians, 
269 
Brun’s constant, 399 
Busy Beaver, 133 


Carmichael numbers, 192 
characterization of, 193 
certificate of primality, 197 
Chebyshev, 376 
Chinese remainder theorem, 61, 
63, 126 
example, 64 
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for solving polynomial 
congruences, 275 
practical application of, 68 
cipher, 155 
class number, 221 
CoCalc, xiv 
code, 155 
coin problem, see conductor 
combinatorics, 100 
completing the square, 277 
composite number, 71 
conductor, 2 
exercises, 5 
explore with Sage, 7 
solution, 135 
congruence 
arithmetic well-defined, 47 
of two numbers, 44 
same as having same 
remainder, 44 
congruences 
as solutions to congruences, 
92 
giving system of congruences, 
66 
linear, see linear congruences 
modular equivalent of 
equations, 52, 55 
quadratic, 273 
system of, see system of 
congruences 
congruent number problem, 34, 38 
conjecture 
Artin’s, 308 
Birch-Swinnerton-Dyer, 459, 
ATS, ATT 
Carmichael’s, 133 
Catalan’s, 38, 257 
generalized 
Elliott-Halberstam, 400 
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Goldbach, 398 
Polignac’s, 396 
Riemann hypothesis, 447 
twin prime, see twin prime 
conjecture 
Von Koch’s, 442 
Wagstaff’s, 400 
continued fraction, 204 
convolution, see Dirichlet product 
coprime, 17 
cancellation in linear 
congruences, 57 
chances at random, 430 
needed for Diffie-Hellman, 162 
coprime in pairs, see mutually 
coprime 
counting numbers, see numbers, 
counting 
CRT, see Chinese remainder 
theorem 
cryptography, 155, see also 
encryption method 
Advanced Encryption 
Standard, 182 
asymmetric key, 161 
cipher, 155 
decode, 156 
decryption, 158 
digital signature, 175 
elliptic curve, 169 
encode, 156 
encryption, 158 
key exchange, 167 
‘man in the middle’ attack, 
168, 174 
public-key, 161, 169, 307 
secret sharing, 179 
symmetric key, 158 
trapdoor, 169 
Cython, 368 


decode, 156 
decryption, 158 
key, 158 
def, 156 
deficient number, see number, 
deficient 
density 
positive, 394 
zero, 394, 436 
Diffie-Hellman 
encryption, 162, 164 
key exchange, 167 
digital signature, 175 
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Diophantine equations 
general, 31 
higher-order, 248 
linear, see linear Diophantine 
equations 
Diophantus, 21 
Dirichlet, 393 
Dirichlet eta function, 436, 459 
Dirichlet product, 407 
Dirichlet series, 422 
Dirichlet’s Theorem, see primes, in 
an arithmetic progression 
divisibility, 4 
basic facts, 5 
division algorithm, 9 
uses of, 11 
divisor, 4 
common, 12 
greatest common, 12 
characterization, 12 
use in Pollard rho, 206 
zero and zero, 12 
proper, 4, 332 
divmod, 10 
Dodgson, Charles, see Lewis 
Carroll 
Dudeney, 253 


eggs in a basket, 69 
Eisenstein, 295, 298 
Eisenstein criterion 
ambiguous name, 298 
for quadratic residues, 298 
Elements, see Euclid’s Elements 
elliptic curves, 38, 457, 459 
cryptographic applications, 
169, 204 
Mordell curves as special case, 
205 
Mordell’s theorem on rational 
points, 258 
use in proving Fermat’s Last 
Theorem, 243 
encode, 156 
encryption key, 158 
encryption method 
Diffie-Hellman, 162, 164 
El-Gamal, 182 
elliptic curve, 169 
Goldwasser-Micali, 307 
RSA, 171 
eponymy 
Boyer’s law of, see Stigler’s 
law of eponymy 
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Stigler’s law of, see Stigler’s 
law of eponymy 
equivalence class, 48 
mod n, 47 
equivalence relation, 46 
congruence as example of, 46 
Eratosthenes, 76 
sieve of, see sieve of 
Eratosthenes 
Euclid’s Elements, 13 
perfect numbers, 332 
Euclidean algorithm, 13, 240 
applied to Fibonacci numbers, 
18 
example, 13 
extended, 14, 15 
example, 14, 15 
proof, 14 
statement, 13 
Euler, 213, 257 
and quadratic residues, 280 
son (Johann Albrecht), 244 
Euler ¢ function, 124, 320 
long-term average, 435 
Euler products, 423 
Euler’s criterion 
for quadratic residues, 286 
Euler’s theorem, 125 
exploring formulas, 128 
multiplicative, 130 
using for inverses, 125 
visualization, 138 
Euler-Mascheroni constant, 359, 
397, 400, 476 
ir/rationality unknown, 359 
euler_phi, 126 
exponentiation (mod n) 
algorithm for, 51 
in cryptography, 161, 168 
not well-defined, 48 
visualization, 110 


factor, 72 

factorial, 81, 95, 221, 274 
prime, see prime, factorial 

factorization 
continued fraction, 211 
Fermat, 202 
in cryptography, 198 
non-unique, 79, 85, 221 
of an integer, 76 
Pollard p — 1, 210 
Pollard rho, 206 
prime, 76 
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prime power, 76 
quadratic sieve, 211 
Shor’s algorithm, 204 
trial, see trial division 
unique, 78, 79, 240, 257, see 
also fundamental 
theorem of arithmetic 
in Gaussian integers, 240 
Fermat, 213 
Fermat factorization, 202 
Fermat numbers, 170, 185 
factoring, 208 
Pépin’s test for primality, 306 
primes from, 186 
Fermat prime, 185 
Fermat’s last theorem, 36, 243 
Fermat’s little theorem, 97 
square root of, 194 
visualization, 194 
visualization, 137, 194 
Fibonacci, 69, 216 
numbers, see numbers, 
Fibonacci 
field, 109 
number, see number field 
with one element, 124 
Fields Medal, 255, 375, 394, 457 
floor function, see greatest integer 
function 
Frobenius number, see number, 
Frobenius 
FTA, see fundamental theorem of 
arithmetic 
function 
arithmetic, 319 
average value, see long-term 
average 
Chebyshev theta, 380 
Dirichlet identity, 408 
floor, see greatest integer 
function 
Gamma, 359, 476 
greatest integer, see greatest 
integer function 
identity, 330 
Liouville, 410 
Moebius pu, see Moebius pu 
function 
multiplicative, see 
multiplicative function 
probability density, 371 
Riemann zeta, see zeta 
function 
step, 379 
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sum of divisors, see sum of 
divisors functions 
unit, 330 
fundamental region, 228 
fundamental theorem of 
arithmetic, 76 


gamma, see Euler-Mascheroni 
constant 
Gauss 
brief biographical notes, 237 
Gaussian integers, 237 
introducing congruence 
notation, 43 
letter to Encke, 373 
many proofs of quadratic 
reciprocity, 300 
prime numbers, 371, 373 
quote, ix 
Gaussian integers, see integers, 
Gaussian 
Gaussian prime, see prime, 
Gaussian 
gcd, see divisor, greatest common 
generator, see group, generator of 
Germain, 178 
Germain primes, 178 
and Artin’s conjecture, 309 
and Fermat’s Last Theorem, 
243 
and Mersenne numbers, 307 
GIMPS, 188 
Girard, 213 
greatest common divisor, see 
divisor, greatest common 
greatest integer function, 29 
convenience for turning 
functions into sums, 381 
use in estimating number of 
divisors, 350 
group, 114 
Abelian, 118, 262, 295, 411 
cyclic, 117, 189, 145 
example of non-Abelian, 118 
finite, 115 
generator of, 117, 139 
homomorphism, 288 
ideal class, 221 
identity, 113 
of quadratic residues, see 
quadratic residue, group 
of units, see units, group of 
order of, 116 
order of an element, 116 
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quotient, 282 

socks and shoes property, 115, 
118 

solving equations in, 115, 121 


hardware bugs found using 
number theory, 188, 399 
harmonic series, 359, 421 
prime, see prime harmonic 
series 
Hensel’s lemma, 90 
for solving polynomial 
congruences, 275 
quadratic example, 91, 275 
special case, 92 
Historical remarks 
list of, 463 


identity element, 113 
induction proof, see proof by 
induction 
infinite descent, see proof by 
infinite descent 
integer lattice, 25, 324, 352 
as complex numbers, 238 
positive points, 26, 30 
integers, | 
Eisenstein, 245 
Gaussian, 237 
unique factorization, 240 
modulo n, 107 
integral test for series convergence, 
421 
@interact, 209 
interacts, see Sage interacts 
inverse 
computing with Sage, 62 
modulo p, 109 
of a group element, 114 
of a number, 62 
group of units, 123 
of a product, 115 
used in proof of CRT, 63 
visualize, 109 
inverse_mod, 62, 109 
irrational number, 25 
Apéry’s constant, 424 
examples, 84 
y status unknown, 359 
is_prime, 71 


Jacobi symbol, 303, 305 
same as Legendre for 2, 316 


key 
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decryption, 158 

encryption, 158 

exchange, 167 
Korselt’s theorem, 193 
Kronecker symbol, 304 
kronecker_symboL, 304 


Lagrange, 281 
and quadratic residues, 280 
Lagrange’s theorem 
for polynomials, 94 
false for composite moduli, 
95 
vindicated, 221 
on group order, 117 
A(n), see function, Liouville 
Landau notation, 347 
basic exercises, 365 
prime counting function 7(2) 
computation, 377 
lattice 
general, 223 
integer, see integer lattice 
positive integer points, 26, 30 
sublattice, 226 
lcm, see least common multiple 
least common multiple, 18, 69, 84 
and divisor, greatest common, 
84 
Legendre 
biography, 282 
prime numbers, 371 
quadratic residues, 287 
Legendre symbol, 287 
computation, 302 
as checking parity, 297 
using Jacobi symbol, 304 
via Eisenstein, 298 
via Euler’s criterion, 288 
via quadratic reciprocity, 
301 
multiplicative, 283, 293 
Legendre_symbol, 287 
lemma, 44 
correct Greek plural of, 77 
easier English plural of, 77 
Lewis Carroll, 13 
Li(x), see logarithmic integral 
linear congruences, 55 
full solution, 55 
simplification strategies, 57 
linear Diophantine equations, 21 
geometric interpretation, 25 
solutions of, 22 
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list comprehension, 52, 127 
filtered, 139 
logarithm 
discrete, 150 
natural, 353 
logarithmic integral, 371 
long-term average 
Euler ¢ function, 435 
sum of divisors functions, 355, 
362 
sums of squares, 326, 347 
Lucas-Lehmer test, 188 


‘man in the middle’ attack, 168, 
174 
MathJax, vii 
maximum, 80 
Mersenne, 187 
and amicable numbers, 337 
Mersenne numbers, 187 
and Germain primes, 307 
primes from, 189 
Mersenne primes, 187 
computer search, see GIMPS 
in perfect numbers, 332 
Mihailescu’s theorem, 38, see also 
conjecture, Catalan’s 
Miller’s test base a, 195 
Miller-Rabin test for primality, 
197 
minimum, 80 
Minkowski, 231 
Minkowski’s Theorem, 222 
mod(x,m), 43 
modulus, 44 
moebius, 406 
Moebius pw function, 404 
alternate definition, 405 
multiplicative, 414 
Moebius inversion formula, 406 
monkeys, see pirates 
monoid 
commutative, 411 
Mordell equation, 37, 255 
finitely many integer points, 
255 
rational points, 258 
special cases, 98, 256, 257, 308 
visualization, 98 
Mordell’s theorem, 258 
p(n), see Moebius pu function 
multiplicative function, 320 
Euler’s function as, 130 
Legendre symbol as, 283, 293 
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Moebius function as, 414 
preserved by Dirichlet 
product, 412 
preserved by inversion, 412 
preserved by summation, 329 
mutually coprime, 75 
application of CRT, 68, 179 
combine solutions, 89 
definition, 17 
in Pythagorean triples, 31 
needed for CRT, 61 
mutually relatively prime, see 
mutually coprime 
Mobius, 404 


natural numbers, see numbers, 
counting 
Newton’s method, 92 
next_prime, 155 
norm, 224, 240 
number 
abundant, 334 
Carmichael, see Carmichael 
numbers 
composite, 71 
deficient, 334 
Fermat, see Fermat numbers 
Frobenius, see conductor 
irrational, see irrational 
number 
k-perfect, 334 
Mersenne, 187 
perfect, see perfect numbers 
prime, see prime 
pseudoperfect, 334 
pseudoprime, see pseudoprime 
rational, see rational number 
Skewes’, 374 
superabundant, 334 
weird, 334 
number field, 204 
numbers 
amicable, see amicable 
numbers 
complex, 237, 273 
counting, 1 
Fibonacci, 18 
natural, see numbers, 
counting 


w(n), 410 
operation 
associative, 113 
example where fails, 113 
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binary, 112 
closed, 113 
commutative, 118 
opposite parity, see parity, 
opposite 
order 
of a group, 116 
of a group element, 116 


parametrization, 249 
parity 
big problems reduce to 
checking, 297, 299, 313 
opposite, 33 
same, 33 
partition 
of a number, 459 
of sets, 48, 102, 131 
Pell’s equation, 264 
perfect numbers, 332, see also 
abundancy index 
and Mersenne primes, 332 
characterization of even, 333 
in Euclid’s Elements, 332 
odd, 338-340 
and abundancy index, 340 
currently known criteria, 
340 
@(n), see Euler ¢ function 
a(x), see prime counting function 
1 (a) 
Picasso 
quote, xi 
pigeonhole principle, 116 
pirates, 69 
points 
adding, 262 
doubling, 262 
rational on conics, 248 
rational on elliptic curves, see 
elliptic curves, Mordell’s 
theorem on rational 
points 
Pollard rho factorization, 206 
PolyMath Projects, 399 
polynomial 
prime-generating, 73 
positive density, see density, 
positive 
powers, see exponentiation (mod 
n) 
PreTeXt, vii 
prime, 71 
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as conjectured from concept 
of relatively prime, 19 
constellation, 400 
factorial, 400 
Fermat, see Fermat prime 
Gaussian, 239 
visualization, 239 
Germain, see Germain primes 
harmonic series, 432 
Mersenne, see Mersenne 
primes 
primorial, 400 
races, 387 
relatively, see coprime 
repunit, 83 
safe, 178, 309 
prime counting function 7(a), 367 
explicit formula, see Riemann 
explicit formula for (2) 
Landau (Big Oh) 
computation, 377 
not useful formula, 368 
prime number theorem, 375 
elementary proof, 375 
prime_divisors, 80 
prime_pi, 368 
prime_range, 72 
primes 
arithmetic progressions of, 
393 
cousin, 400 
in an arithmetic progression, 
392 
proof of infinitude of, 75, 388, 
392 
sexy, 400 
twin, see twin primes 
primitive root, 138, 282 
characterization of, 139 
number of, 143 
primes possess, 145 
testing for, 140 
use in solving congruences, 
147 
primitive_root, 148 
primorial, 395, 432 
prime, see prime, primorial 
print, 65 
proof 
by contradiction, 2 
by contrapositive, 2 
by induction, 3, 79 
another easy example, 4 
by infinite descent, 35, 38 
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direct, 5 
proper divisor, see divisor, proper 
pseudoperfect number, see 
number, pseudoperfect 
pseudoprime, 191 
infinitely many, 197 
strong, see strong 
pseudoprime 
public-key cryptography, 161 
Pythagorean theorem, 28, 31 
Pythagorean triple, 31 
characterization of primitive, 
33 
primitive, 31 
group operation, 241 
Python, xv, 7 
comments, 72 
indexing, 10 
loop, 11 
Pépin’s test, 186, 306 


Qin Jiushao, 61, 69 
QR, see quadratic residue 
quadratic congruences, see 
congruences, quadratic 
quadratic forms, 259 
quadratic formula, 273 
quadratic nonresidue, 278 
quadratic reciprocity, 300 
alternate form, 301, 302 
applications of, 305 
cryptography, 307 
factoring, 305 
primality testing, 306 
many proofs, 300, 315 
meaning, 301 
proof of, 309 
quadratic residue, 278, see also 
Legendre symbol, see 
also quadratic reciprocity 
consecutive ones, 291 
Eisenstein criterion, 298 
Euler criterion, 286 
group, 282 
visualization, 284 
quadratic sieve, 211 
quadratic_residues, 279 
quaternions, 242 
quotient, 9 


range, 11, 149 
rational number, 25 
reify, 287 

relation, 43 
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equivalence, see equivalence 
relation 
relatively prime, see coprime 
remainder, 9 
connection to congruence, 44 
repunit, 83 
residue 
(mod n), 47 
quadratic, see quadratic 
residue 
residues 
complete system of, 48 
least absolute, 48, 49 
least nonnegative, 48, 49 
Riemann, 419 
Riemann explicit formula for 7(2), 
454 
Riemann Hypothesis, 447 
consequences of, 456 
ring, 107 
example of non-unique 
factorization domain, 85 
example of unique 
factorization domain, 1, 
240, 257 
of arithmetic functions, 412 
of integers (hint of), 237, 245, 
260, 267 
RSA, see encryption method, RSA 


Sage, xi, 6 
cell server, vii 
cells, 6 
get worksheet, 6 
interactive help, 62 
interacts, xiv, 209 
Sage notes 
about, xv, 7 
list of, 461 
SageMath, see Sage 
same parity, see parity, same 
secret sharing, 179 
set partition, see partition of sets 
sets, 112 
sieve 
of Eratosthenes, 76 
quadratic, 211 
a(n), see sum of divisors functions 
sigma, 328 
on(n), see sum of divisors 
functions 
Skewes’ number, 374 
solve_mod, 276 
square root modulo n, 220 
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preliminary exploration, 87 
Stigler’s law of eponymy, 265, see 
also eponymy, Boyer’s 
law of 
example, 15, 104, 265, 404 
strong pseudoprime, 196 
sum of divisors functions, 327 
long-term average, 355, 362 
sums of squares, 213, 237, 322 
full statement, 232 
insane fact concerning, 326 
long-term average, 326, 347 
more than two squares, 241 
primes as, 222 
visualization, 215 
Zagier one-sentence proof, 233 
superabundant number, see 
number, superabundant 
system of congruences, 61, 66 
linear fully solved, 66 


table 
addition, 107 
multiplication, 108 
T(n), see sum of divisors functions 
Taylor series 
in Hensel’s Lemma, 90 
proving Euler’s formula, 444 
Tertullian 
quote, xiii 
Thabit ibn Qurra, 337, 343 
O(x), see function, Chebyshev 
theta 
trapdoor, 169 
trial division, 200 
algorithm, 200 
trial factorization, see trial 
division 
try/except, 215 
tuple, 10, 45, 276 
twin prime 
conjecture, 396 
constant, 398 
twin primes, 396 
and Fermat factorization, 203 
type, 45 


units, 123 

examples, 123 

group of, 122, 177 

modulo n, see units, group of 

quadratic residues quotient 
group of, 282 

quadratic residues subgroup 
of, 282, 287 
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visualization 
Euler’s theorem, 138 
exponentiation (mod n), 110 
Fermat’s little theorem, 137, 

194 

Gaussian primes, 239 
Mordell equation, 98 
quadratic residue, 284 
Riemann zeta function, 445 
sums of squares, 215 


Waring’s Problem, 243 
weird number, see number, weird 
well-defined, 47 
exponentiation (mod n) not 
an example, 48 
well-ordering principle, 2 
not equivalent to induction, 3 
Euclid implicitly assuming, 13 
proof of division algorithm 
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using, 10 
proof of Euclidean algorithm 
using, 14 


use in infinite descent, 35 
use to define order of group 
element, 116 
Wilson’s theorem, 95 
false for composite moduli, 
100 


xgcd, 15 
Yazdi, 337, 3438, 344 


zero density, see density, zero 
zeta function, 419 
alternating, see Dirichlet eta 
function 
special values of, 424 
visualization, 445 
Zhang, 399 
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Number theory is a beautiful subject, where intuition from our earliest years can 


lead to subtle, yet still unsolved problems. Number Theory: In Context and 


Interactive covers standard topics such as systems of congruences, primitive roots, 
and arithmetic functions, while encouraging a sense of wonder with graphical and 
handwritten explorations right up through stating the Riemann Hypothesis. The 
online version of this book has dozens of interactive graphics and other exploratory 
code using the open source software SageMath. 


Karl-Dieter Crisman has taught Number Theory to undergraduates at Gordon College for 15 
years. Their response to free interactive computation to help solidify these concepts has been 
overwhelming and gratifying. Their support, and that of the SageMath and PreTeXt open source 


communities, led to the creation of this book. 


“An invaluable resource for my students.” 


“The embedded Sage demos were particularly useful for 
helping students visualize certain concepts.” 


“| was very happy and fortunate to have your text available 
[during the COVID pivot] . . . | plan to continue to use this innovative open text. 


“Really spot on for what we needed .. . I’m confident that we'll use it again.” 


Cover: In the foreground, a matrix of the powers of integers in modular arithmetic shows hidden surprises. Using 
different color schemes to represent the numbers can help us visualize theorems about the group of units. In the 
background, the prime counting function is well approximated by the undulations of the Riemann explicit formula in 
terms of the (still-mysterious) zeros of the Riemann zeta function. 
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